In this chapter, we’ll give a demonstration of how you can formally specify a database design. Formalizing a database design specification has the advantage of avoiding any ambiguity in the documentation of not only the database structure but, even more importantly, of all involved data integrity constraints.
Note Within the IT industry, the term business rules is often used to denote what this book refers to as data integrity constraints. However, because a clear definition of what exactly is meant by business rules is seldom given, we cannot be sure about this. In this book, we prefer not to use the term business rules, but instead use data integrity constraints. In this chapter, we’ll give a clear definition of the latter term.
We’ll give the formal specification of a database design by defining the data type of a database variable. This data type—essentially a set—holds all admissible database states for the database variable and is dubbed the database universe.
You’ll see that a database universe can be constructed in a phased (layered) manner, which along the way provides us with a clear classification schema for data integrity constraints.
First, you define what the vocabulary is. What are the things, and aspects of these things in the real world, that you want to deal with in your database? Here you specify a name for each table structure that is deemed necessary, and the names of the attributes that the table structure will have. We’ll introduce an example database design to demonstrate this. The vocabulary is formally defined in what is called a database skeleton. A good way to further explain the meaning of all attributes (and their correlation) is to provide the external predicate for each table structure; this is a natural language sentence describing the meaning and correlation of the involved attributes.
Given the database skeleton, we then define for each attribute the set of admissible attribute values. This is done by introducing a characterization for each table structure. You were introduced to the concept of a characterization in Chapter 4.
You’ll then use these characterizations as building blocks to construct the set of admissible tuples for each table. This is called a tuple universe, and includes the formal specification of tuple constraints.
Then, you’ll use the tuple universes to build the set of admissible tables for each table structure. This set is called a table universe, and can be considered the data type of a table variable. The definition of a table universe will include the formal specification of the relevant table constraints.
The last section of this chapter shows how you can bring together the table universes in the definition of the set of admissible database states, which was the goal set out for this chapter: to define a database universe. In this phase you formally specify the database (multi-table) constraints.
Because the example database universe presented in this chapter has ten table structures, we’ll introduce you to ten characterizations, ten tuple universes, and ten table universes. This, together with the explanatory material provided, makes this chapter a rather big one. However, the number of examples should provide you with a solid head start to applying the formal theory, and thereby enable you to start practicing this methodology in your job as a database professional. You can find a version of this example database design specification that includes the design’s bare essentials in Appendix A.
After the “Chapter Summary” section, a section with exercises focuses primarily on changing or adding constraint specifications in the various layers of the example database universe introduced in this chapter.
Because you’re reading this book, you consider yourself a database professional. Therefore, it’s likely that the activity of specifying database designs is part of your job. You’ll probably agree that the process of designing a database roughly consists of two major tasks:
The application of the math introduced in Part 1 of this book is primarily geared to the second task; it enables you to formally specify the data integrity constraints. We’re convinced that whenever you design a database, you should spend the biggest part of time on designing the involved data integrity constraints. Accurately—that is, unambiguously—documenting these data integrity constraints can spell the difference between your success and failure.
Still, today documenting data integrity constraints is most widely done using natural language, which often produces a quick dive into ambiguity. If you use plain English to express data integrity constraints, you’ll inevitably hit the problem of how the English sentence maps, unambiguously, into the table structures. Different programmers (and users alike) will interpret such sentences differently, because they all try to convert these into something that will map into the database design. Programmers then code their perception of the constraint (not necessarily the specifier’s).
The sections that follow will demonstrate that the logic and set theory introduced in Part 1 lends itself excellently to capturing database designs with their integrity constraints in a formal manner. Formal specifications of data integrity constraints tell you exactly how they map into the table structures. You’ll not only avoid the ambiguity mentioned earlier, but moreover you’ll get a clear and expert view of the most important aspect of a database: all involved data integrity constraints.
Note Some of you will be surprised, by the example that follows, of how much of the overall specification of an information system actually sits in the specification of the database design. A lot of the “business logic” involved in an information system can often be represented by data integrity constraints that map into the underlying table structures that support the information system.
Having set the scene, we’ll now demonstrate how set theory and logic enable you to get a clear and professional view of a database design and its integrity constraints. The next two sections introduce you (informally) to a particular way of looking at the quintessence of a database design. This view is such that it will enable a layered set-theory specification of a database design.
A database (state) at any given point in time is essentially a set of tables. Our database, or rather our database variable, holds the current database state. In the course of time, transactions occur that assign new database states to the database variable. We need to specify the set of all admissible database states for our database variable. This set is called the database universe, and in effect defines the data type for the database variable. Viewed top down, within the database universe for a given database design that involves say n
table structures, you can observe the following:
n
tables (one per table structure), whereBecause all preceding layers are sets, you can define them all mathematically using set theory. Through logic (by adding embedded predicates) you define exactly what is meant by admissible in each layer; here the data integrity constraints enter the picture.
So how do you specify, in a formal way, this set called the database universe? This is done in a bottom-up approach using the same layers introduced earlier. First, you define what your vocabulary is: what are the things, and aspects of them in the real world, that you want to deal with in your database? In other words, what table structures do you need, and what attributes does each table structure have? This is formally defined in what is called a database skeleton.
For each attribute introduced in the database skeleton, you then define the set of admissible attribute values. You’ve already been introduced to this; in this phase all characterizations (one per table structure) are defined.
You then use the characterizations as building blocks to build (define) for each table structure the set of admissible tuples. This involves applying the generalized product operator (see Definition 4-7) and the introduction of tuple predicates. The set of admissible tuples is called a tuple universe.
You can then use the tuple universes to build for each table structure the set of admissible tables, which is called a table universe. You’ll see how this can be done in this chapter; it involves applying the powerset operator and introducing table predicates.
In the last phase you define the set of admissible database states—the database universe—using the previously defined table universes.
This methodology of formally defining the data type of a database variable was developed by the Dutch mathematician Bert De Brock together with Frans Remmen in the 1980s, and is an elegant method of accurately defining a database design, including all relevant data integrity constraints. The references De grondslagen van semantische databases (Academic Service, 1990, in Dutch) and Foundations of Semantic Databases (Prentice Hall, 1995) are books written by Bert De Brock in which he introduces this methodology.
In this bottom-up solid construction of a database universe, you explicitly only allow sets of admissible values at each of the levels described earlier. This means that at each level these sets must satisfy certain data integrity constraints. The constraints specify which sets are valid ones; they condition the contents of the sets. This leads straightforwardly to four classes of data integrity constraints:
Note We’ll revisit this matter in Chapter 11 when the attribute value sets are implemented in an SQL database management system.
Job
and Salary
of an EMP
(employee) table structure: “Employees with job President
earn a monthly salary greater than 10000
dollars.”Manager
attribute in the EMP
table structure that references the employee’s manager).EMP
and DEPT
table structures: each employee must work for a known department.These four classes of constraints accept or reject a given database state. They condition database states and are often referred to as static (or state) constraints; they can be checked within the context of a (static) database state. In actuality there is one more constraint class. This is the class of constraints that limit database state transitions (on grounds other than the static constraints). Predicates specifically conditioning database state transitions are referred to as dynamic (or state transition) constraints. We’ll cover these separately in Chapter 8.
Because the preceding classification scheme is driven by the scope of data that a constraint deals with, it has the advantage of being closely related to implementation issues of constraints. When you implement a database design in an SQL DBMS, you’ll be confronted with these issues, given the poor declarative support for data integrity constraints in these systems. This lack of support puts the burden upon you to develop often complex code that enforces the constraints. Chapter 11 will investigate these implementation challenges of data integrity constraints using the classification introduced here.
We’ll demonstrate the application of the theory presented in Part 1 of this book through an elaborate treatment of a database design that consists of ten table structures.
We comment up front that this database design merely serves as a vehicle to demonstrate the formal specification methodology; it is explicitly not our intention to discuss why the design is as it is. We acknowledge that some of the assumptions on which this design is based could be questionable. Also we mention up front that this design has two hacks, probably by some of you considered rather horrible. We’ll indicate these when they are introduced.
Figure 7-1 shows a diagram of the ten table structures (represented by boxes) and their mutual relationships (represented by arrows). Each of the arrows indicates a subset requirement predicate that is applicable between a pair of table structures.
Note The majority of these arrows represent what is often called many-to-one relationships and will eventually end up as foreign keys during the implementation phase in an SQL DBMS. However, this need not always be the case, as you will see. The exact meaning of each arrow will be given in the database universe specification where each arrow translates to a database constraint.
Our database holds employees (EMP
) and departments (DEPT
) of a company. Some of the arrows indicate the following:
GRD
).Employee history (HIST
) records are maintained for all salary and/or “works-for-department” changes; every history record describes a period during which one employee was assigned to a department with a specific salary.
We hold additional information for all sales representatives in a separate table structure (SREP
). We hold additional information for employees who no longer work for the company (that is, they have been terminated or they resigned) in TERM
. Note that we keep the EMP
information for terminated employees. We also hold additional information for all managed employees (MEMP
); that is, employees who have a manager assigned to them.
The database further holds information about courses (CRS
), offerings (OFFR
) of those courses, and registrations (REG
) for those course offerings. Some more arrows show the following:
You now have some idea of the information we’re maintaining in this database. In the next section, you’ll find the database skeleton. As mentioned before, it introduces the names of all attributes for every table structure. Together with the table structure names, they form the vocabulary that we have available in our database design.
The names of the things in the real world that we are representing in our database design, including the names of the attributes of interest, are introduced in what is called a database skeleton. We sometimes refer to this as the conceptual skeleton. As you saw in Chapter 5, a database skeleton is represented as a set-valued function. The domain of the skeleton function is the set of table structure names. For each name, this function yields the set of attribute names of that table structure; that is, the heading of that table structure.
Our database skeleton DB_S
for the example database design is defined in Listing 7-1. Inside the specification of DB_S
you see embedded comments (/* ... */
) to clarify further the chosen abbreviations for the table structure and attribute names.
DB_S := { (EMP; -- Employees
{ EMPNO /* Employee number */
, ENAME /* Employee name */
, JOB /* Employee job */
, BORN /* Date of birth */
, HIRED /* Date hired */
, SGRADE /* Salary grade */
, MSAL /* Monthly salary */
, USERNAME /* Username */
, DEPTNO } ) /* Department number */
, (SREP; -- Sales Representatives
{ EMPNO /* Employee number */
, TARGET /* Sales target */
, COMM } ) /* Commission */
, (MEMP; -- Managed Employees
{ EMPNO /* Employee number */
, MGR } ) /* Manager: employee number */
, (TERM; -- Terminated Employees
{ EMPNO /* Employee number */
, LEFT /* Date of leave */
, COMMENTS } ) /* Termination comments */
, (DEPT; -- Departments
{ DEPTNO /* Department number */
, DNAME /* Department name */
, LOC /* Location */
, MGR } ) /* Manager: employee number */
, (GRD; -- Salary Grades
{ GRADE /* Grade code */
, LLIMIT /* Lower salary limit */
, ULIMIT /* Upper salary limit */
, BONUS } ) /* Yearly bonus */
, (CRS; -- Courses
{ CODE /* Course code */
, DESCR /* Course description */
, CAT /* Course category */
, DUR } ) /* Duration of course in days */
, (OFFR; -- Course Offerings
{ COURSE /* Code of course */
, STARTS /* Begin date of this offering */
, STATUS /* Scheduled, confirmed, ... */
, MAXCAP /* Max participants capacity */
, TRAINER /* Trainer: employee number */
, LOC } ) /* Location */
, (REG; -- Course Registrations
{ STUD /* Student: employee number */
, COURSE /* Course code */
, STARTS /* Begin date course offering */
, EVAL } ) /* Evaluation */
, (HIST; -- Employee History Records
{ EMPNO /* Employee number */
, UNTIL /* History record end date */
, DEPTNO /* Department number */
, MSAL } ) } /* Monthly salary */
Given the database skeleton, you can now write expressions such as DB_S(DEPT)
, which represents the set of attribute names of the DEPT
table structure. The expression denotes the set {DEPTNO, DNAME, LOC, MGR}
.
With this definition of the table headings, you’re now developing some more sense of what each table structure in our database design is all about—what it intends to represent. A way to clarify further the meaning of the table structures and their attributes is to provide the external predicates. An external predicate is an English sentence that involves all attributes of a table structure and supplies a statement regarding these attributes that explains their interconnected meaning. Following is the external predicate for the EMP
table structure:
The employee with employee number EMPNO has name ENAME, job JOB, was born on BORN, is hired on HIRED, has a monthly salary of MSAL dollars within the SGRADE salary grade, is assigned to account USERNAME and works for the department with department number DEPTNO.
It is called external because a database management system cannot deal with this English sentence. It is meant for the (external) users of the system, and supplies an interpretation of the chosen names for the attributes. It is called a predicate because you can view this English sentence as being parameterized, where the parameters are the embedded attribute names. You can instantiate the external predicate using the tuples in the current EMP
table. You do this by replacing every occurrence of an attribute name inside the sentence with the corresponding attribute value within a given tuple. The new sentence formed this way can be viewed as a proposition that can either yield TRUE
or FALSE
. Sentences generated in this way by the external predicate are statements about the real world represented by the table. By convention, the propositions that are constructed in this way are assumed to be TRUE
. This is precisely how external predicates further clarify the meaning of your database design.
Table 7-1 lists the external predicates for all table structures introduced in the skeleton.
Note Have you spotted the two hacks? Apparently there are two sorts of offerings: offerings with a trainer assigned and offerings without one assigned. A similar remark can be made about registrations; some of them include an evaluation score for the course offering, and some of them don’t. In a properly designed database, you should have decomposed the offering and registration table structures into two table structures each.
These external predicates give you an informal head start with regards to the meaning of all involved table structures and their attributes that were introduced by the database skeleton. The exact meaning of this example database design will become clear as we progress through all formal phases of a database universe definition in the sections that follow.
The next section will supply a characterization for each table structure introduced in the skeleton.
As you saw in Chapter 4, a characterization defines the attribute value sets for the attributes of a given table structure. For a given table structure, the characterization is a set-valued function whose domain is the set of attributes of that table structure. For each attribute, the characterization yields the attribute value set for that attribute. The characterizations form the base on which the next section will build the tuple universes. You’ll then notice that the way these characterizations are defined here is very convenient. Take a look at Listing 7-2. It defines the characterization for the EMP
table.
Note A few notes:
In defining the attribute value sets for the EMP
table, we are using the shorthand names for sets that were introduced in Table 2-4.
We use chr_<table structure name>
as a naming convention for the characterization of a table structure.
In the definition of chr_EMP
(and in various other places) you’ll see a function called upper
. This function accepts a case-sensitive string and returns the uppercase version of that string.
chr_EMP :=
{ ( EMPNO; [1000..9999] )
, ( ENAME; varchar(9) )
, ( JOB; /* Five JOB values allowed */
{'PRESIDENT','MANAGER','SALESREP',
'TRAINER','ADMIN'} )
, ( BORN; date )
, ( HIRED; date )
, ( SGRADE; [1..99] )
, ( MSAL; { n | n∊number(7,2) ∧ n > 0 } )
, ( USERNAME; /* Usernames are always in uppercase */
{ s | s∊varchar(15) ∧
upper(USERNAME) = USERNAME } )
, ( DEPTNO; [1..99] )
}
For every attribute of table structure EMP
, function chr_EMP
yields the attribute value set for that attribute. You can now write expressions such as chr_EMP(EMPNO)
, which represents the attribute value set of the EMPNO
attribute of the EMP
table structure. The expression denotes set [1000..9999]
.
The definition of characterization chr_EMP
tells us the following:
EMPNO
values are positive integers within the range 1000
to 9999
.ENAME
values are variable length strings with at most nine characters.JOB
values are restricted to the following five values: 'PRESIDENT'
, 'MANAGER'
, 'SALESREP'
, 'TRAINER'
, 'ADMIN'
.BORN
and HIRED
values are date values.SGRADE
values are positive integers in the range 1
to 99
.MSAL
values are positive numbers with precision seven and scale two.USERNAME
values are uppercase variable length strings with at most 15 characters.DEPTNO
values are positive integers in the range 1
to 99
.In the remainder of our database design definition, four sets will occur quite frequently: employee numbers, department numbers, salary-related amounts, and course codes. We define shorthand names (symbols for ease of reference in the text) for them here, and use these in the characterization definitions that follow.
EMPNO_TYP := { n | n∊number(4,0) ∧ n > 999 }
DEPTNO_TYP := { n | n∊number(2,0) ∧ n > 0 }
SALARY_TYP := { n | n∊number(7,2) ∧ n > 0 }
CRSCODE_TYP := { s | s∊varchar(6) ∧ s = upper(s) }
Listings 7-3 through 7-11 introduce the characterization for the remaining table structures. You might want to revisit Table 7-1 (the external predicates) while going over these characterizations. Embedded comments clarify attribute constraints where deemed necessary.
chr_SREP :=
{ ( EMPNO; EMPNO_TYP )
/* Targets for sales reps are five digit numbers */
, ( TARGET; [10000..99999] )
, ( COMM; SALARY_TYP )
}
chr_MEMP :=
{ ( EMPNO; EMPNO_TYP )
, ( MGR; EMPNO_TYP )
}
chr_TERM :=
{ ( EMPNO; EMPNO_TYP )
, ( LEFT; date )
, ( COMMENTS; varchar(60) )
}
chr_DEPT :=
{ ( DEPTNO; DEPTNO_TYP )
, ( DNAME; { s | s∊varchar(12) ∧ upper(DNAME) = DNAME } )
, ( LOC; { s | s∊varchar(14) ∧ upper(LOC) = LOC } )
, ( MGR; EMPNO_TYP )
}
chr_GRD :=
{ ( GRADE; { n | n∊number(2,0) ∧ n > 0 } )
, ( LLIMIT; SALARY_TYP )
, ( ULIMIT; SALARY_TYP )
, ( BONUS; SALARY_TYP )
}
chr_CRS :=
{ ( CODE; CRSCODE_TYP )
, ( DESCR; varchar(40) )
/* Course category values: Design, Generate, Build */
, ( CAT; {'DSG','GEN','BLD'} )
/* Course duration must be between 1 and 15 days */
, ( DUR; [1..15] )
}
chr_OFFR :=
{ ( COURSE; CRSCODE_TYP )
, ( STARTS; date )
/* Three STATUS values allowed: Scheduled, Confirmed, Canceled */
, ( STATUS; {'SCHD','CONF','CANC'} )
/* Maximum course offering capacity; minimum = 6 */
, ( MAXCAP; [6..100] )
/* TRAINER = -1 means "no trainer assigned" */
, ( TRAINER; EMPNO_TYP ∪ { -1 } )
, ( LOC; varchar(14) )
}
chr_REG :=
{ ( STUD; EMPNO_TYP )
, ( COURSE; CRSCODE_TYP )
, ( STARTS; date )
/* -1: too early to evaluate (course is in the future) */
/* 0: not evaluated by attendee */
/* 1-5: regular evaluation values (from 1=bad to 5=excellent) */
, ( EVAL; [-1..5] )
}
chr_HIST :=
{ ( EMPNO; EMPNO_TYP )
, ( UNTIL; date )
, ( DEPTNO; DEPTNO_TYP )
, ( MSAL; SALARY_TYP )
}
Note that in Listing 7-9 the attribute value set for attribute TRAINER
includes a special value -1
next to valid employee numbers. This value represents the fact that no trainer has been assigned yet. In our formal database design specification method, there is no such thing as a NULL
, which is a “value” commonly (mis)used by SQL database management systems to indicate a missing value. There are no missing values inside tuples; they always have a value attached to every attribute. Characterizations specify the attribute value sets from which these values can be chosen. So, to represent a “missing trainer” value, you must explicitly include a value for this fact inside the corresponding attribute value set. Something similar is specifiedin Listing 7-10 in the attribute value set for the EVAL
attribute.
Note Appendix F will explicitly deal with the phenomenon of NULL
s. Chapter 11 will revisit these -1
values when we sort out the database design implementation issues and provide guidelines.
The specification of our database design started out with a skeleton definition and the external predicates for the table structures introduced by the skeleton. In this section you were introduced to the characterizations of the example database design. Through the attribute value sets, you are steadily gaining more insight into the meaning of this database design. The following section will advance this insight to the next layer: the tuple universes.
A tuple universe is a (non-empty) set of tuples. It is a very special set of tuples; this set is meant to hold only tuples that are admissible for a given table structure. You know by now that tuples are represented as functions. For instance, here is an example function tdept1
that represents a possible tuple for the DEPT
table structure:
tdept1 := {(DEPTNO;10), (DNAME;'ACCOUNTING'), (LOC;'DALLAS'), (MGR;1240)}
As you can see, the domain of tdept1
represents the set of attributes for table structure DEPT
as introduced by database skeleton DB_S
.
dom(tdept1) = {DEPTNO, DNAME, LOC, MGR} = DB_S(DEPT)
And, for every attribute, tdept1
yields a value from the corresponding attribute value set, as introduced by the characterization for the DEPT
table structure:
tdept1(DEPTNO) = 10
, which is an element of chr_DEPT(DEPTNO)
tdept1(DNAME) = 'ACCOUNTING'
, which is an element of chr_DEPT(DNAME)
tdept1(LOCATION) = 'DALLAS'
, which is an element of chr_DEPT(LOCATION)
tdept1(MGR) = 1240
, which is an element of chr_DEPT(MGR)
Here’s another possible tuple for the DEPT
table structure:
tdept2 := {(DEPTNO;20), (DNAME;'SALES'), (LOC;'HOUSTON'), (MGR;1755)}
Now consider the set {tdept1, tdept2}
. This is a set that holds two tuples. Theoretically it could represent the tuple universe for the DEPT
table structure. However, it is a rather small tuple universe; it is very unlikely that it represents the tuple universe for the DEPT
table structure. The tuple universe for a given table structure should hold every tuple that we allow (admit) for the table structure.
Note Tuples tdept1
and tdept2
are functions that share the same domain. This is a requirement for a tuple universe; all tuples in the tuple universe share the same domain, which in turn is equal to the heading of the given table structure.
You have already seen how you can generate a set that holds every possible tuple for a given table structure using the characterization of that table structure (see the section “Table Construction” in Chapter 5). If you apply the generalized product to a characterization, you’ll end up with a set of tuples. This set is not just any set of tuples, but it is precisely the set of all possible tuples based on the attribute value sets that the characterization defines.
Let us illustrate this once more with a small example. Suppose you’re designing a table structure called RESULT
; it holds average scores for courses followed by students that belong to a certain population. Here’s the external predicate for RESULT
: “The rounded average score scored by students of population POPULATION
for course COURSE
is AVG_SCORE
.” Listing 7-12 defines the characterization chr_RESULT
for this table structure.
chr_RESULT :=
{ ( POPULATION; {'DP','NON-DP'} )
/* DP = Database Professionals, NON-DP = Non Database Professionals */
, ( COURSE; {'set theory','logic'} )
, ( AVG_SCORE; {'A','B','C','D','E','F'} )
}
The three attribute value sets represent the attribute constraints for the RESULT
table structure. If you apply the generalized product Π to chr_RESULT
, you get the following set of possible tuples for the RESULT
table structure:
Π(chr_RESULT) =
{ { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'A') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'B') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'C') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'D') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'E') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'F') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'A') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'B') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'C') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'D') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'E') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'F') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'A') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'B') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'D') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'E') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'F') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'A') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'B') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'D') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'E') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'F') }
}
In this set of 24 tuples, the previously defined attribute constraints will hold. However, no restrictions exist in this set with regards to combinations of attribute values of different attributes inside a tuple. By specifying inter-attribute—or rather, tuple constraints—you can restrict the set of possible tuples to the set of admissible tuples for the given table.
Suppose that you do not allow average scores D
, E
, and F
for database professionals, nor average scores A
and B
for non-database professionals (regardless of the course). You can specify this by the following definition of tuple universe tup_RESULT
; it formally specifies two tuple predicates:
tup_RESULT :=
{ r | r∊Π(chr_RESULT) ∧
/* ============================ */
/* Tuple constraints for RESULT */
/* ============================ */
/* Database professionals never score an average of D, E or F */
r(POPULATION)='DP' ⇒ r(AVG_SCORE)∉{'D','E','F'} ∧
/* Non database professionals never score an average of A or B */
r(POPULATION)='NON-DP' ⇒ r(AVG_SCORE)∉{'A','B'}
}
The tuple predicates introduced by the definition of a tuple universe are referred to as tuple constraints. You can also specify set tup_RESULT
in the enumerative way.
Note The original set of 24 possible tuples has now been reduced to a set of 14 admissible tuples. Ten tuples did not satisfy the tuple constraints that are specified in tup_RESULT
.
{ { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'A') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'B') }
, { (POPULATION; 'DP'), (COURSE; 'set theory'), (AVG_SCORE; 'C') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'A') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'B') }
, { (POPULATION; 'DP'), (COURSE; 'logic'), (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'D') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'E') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'F') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'D') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'E') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'), (AVG_SCORE; 'F') }
}
Note that the former specification of tup_RESULT
, using the predicative method to specify a set, is highly preferred over the latter enumerative specification, because it explicitly shows us what the tuple constraints are (and it is a shorter definition too; much shorter in general).
Now let’s continue with our example database design. Take a look at Listing 7-13, which defines tuple universe tup_EMP
for the EMP
table structure of the example database design.
tup_EMP :=
{ e | e∊Π(chr_EMP) ∧
/* ========================= */
/* Tuple constraints for EMP */
/* ========================= */
/* We hire adult employees only */
e(BORN) + 18 ≤ e(HIRED) ∧
/* Presidents earn more than 120K */
e(JOB) = 'PRESIDENT' ⇒ 12*e(MSAL) > 120000 ∧
/* Administrators earn less than 5K */
e(JOB) = 'ADMIN' ⇒ e(MSAL) < 5000
}
Note In this definition, we assume that addition has been defined for values of type date (see Table 2-4), enabling us to add years to such a value.
Are you starting to see how this works? Tuple universe tup_EMP
is a subset of Π(chr_EMP)
. All tuples that do not satisfy the tuple constraints (three in total) specified in the definition of tup_EMP
are left out. You can use any of the logical connectives introduced in Table 1-2 of Chapter 1 in conjunction with valid attribute expressions to formally specify tuple constraints.
Note that all ambiguity is ruled out by these formal specifications:
18
or older is meant. The ≤ symbol implies that the day someone turns 18 he or she can be hired.K
” in 120K
and 5K
(in the comments) represents the integer 1000
and not 1024
. The salaries mentioned (informally by the users and formally inside the specifications) are actually the monthly salary in the case of a CLERK
and the yearly salary in the case of a PRESIDENT
. This could be a habit in the real world, and it might be wise to reflect this in the formal specification too. Of course, you can also specify the predicate involving the PRESIDENT
this way: e(JOB) = 'PRESIDENT'
⇒ e(MSAL)
> 10000
.Listings 7-14 through 7-22 introduce the tuple universes for the other table structures in our database design. You’ll find embedded informal comments to clarify the tuple constraints. Note that tuple constraints are only introduced for table structures GRD
, CRS
, and OFFR
; the other table structures happen to have no tuple constraints.
tup_SREP :=
{ s | s∊Π(chr_SREP) /* No tuple constraints for SREP */ }
tup_MEMP :=
{ m | m∊Π(chr_MEMP) }
tup_TERM :=
{ t | t∊Π(chr_TERM) }
tup_DEPT :=
{ d | d∊Π(chr_DEPT) }
tup_GRD :=
{ g | g∊Π(chr_GRD) ∧
/* Salary grades have a "bandwidth" of at least 500 dollars */
g(LLIMIT) ≤ g(ULIMIT) - 500 ∧
/* Bonus must be less than lower limit */
g(BONUS) < g(LLIMIT)
}
tup_CRS :=
{ c | c∊Π(chr_CRS) ∧
/* Build courses never take more than 5 days */
c(CAT) = 'BLD' ⇒ c(DUR) ≤ 5
}
tup_OFFR :=
{ o | o∊Π(chr_OFFR) ∧
/* Unassigned TRAINER allowed only for certain STATUS values */
o(TRAINER) = -1 ⇒ o(STATUS)∊{'CANC','SCHD'}
}
tup_REG :=
{ r | r∊Π(chr_REG) }
tup_HIST :=
{ h | h∊Π(chr_HIST) }
Listing 7-20 defines when the special -1
value is allowed for the TRAINER
attribute; confirmed offerings (STATUS = 'CONF'
) must have an employee number assigned as the trainer.
This concludes the tuple universe layer of our example database design. Through the specification of the tuple constraints, you’ve gained more insight into the meaning of this database design. The next section continues the construction of the database design’s specification, by advancing to the table universe layer. As you’ll see, this involves the application of more set-theory and logic concepts that were introduced in Part 1 of this book.
You can use a tuple universe to build a set of admissible tables (we’ll demonstrate this shortly). Such a set is called a table universe. Every element in a table universe is an admissible table for the corresponding table structure.
A tuple universe is a set of tuples and can be considered a table too. It is a rather large set of tuples, because it has every tuple that can be built using the characterization and taking into consideration the tuple constraints. We’ve mentioned before that a tuple universe can be considered the largest table for a given table structure.
Every subset of a tuple universe is a table too. In fact, if you would construct a set that holds every subset of a tuple universe, then this set would contain lots of tables; every possible table for a given table structure would be in this set. Do you remember, from Part 1 of this book, how to construct the set of all subsets of a given set? The powerset operator does just that. The powerset of a tuple universe can be considered the set of all possible tables for a given table structure.
Note You might want to revisit the section “Powersets and Partitions” in Chapter 2, and refresh your memory regarding the powerset operator.
In a similar way as tuple universes are defined, you can restrict the powerset of a tuple universe to obtain the set of admissible tables. You can add table predicates (constraining combinations of tuples) to discard possible tables that were generated by the powerset operator, but that do not reflect a valid representation of the real world. Table predicates that are used to restrict the powerset of a tuple universe are referred to as table constraints.
Let’s illustrate all this using the RESULT
table structure that was introduced in the previous section. The powerset of tuple universe tup_RESULT
results in a set that holds every possible RESULT
table. There are lots of tables in this set. To be precise, because the cardinality of tup_RESULT
is 14, there are exactly 16384
(2 to the 14th power) possible tables. These are far too many to list in the enumerative way. Figure 7-2 displays just one of these tables (an arbitrarily chosen subset of tup_RESULT
). Let’s name this table R1
.
Table R1
is a subset of tup_RESULT
. It holds 11 distinct tuples. Because these tuples originate from the tuple universe, all of them are admissible tuples; they satisfy the tuple constraints, and every attribute holds an admissible value.
Now assume that the following (informally specified) data integrity constraints play a role in a table for the RESULT
table structure:
POPULATION
and COURSE
is uniquely identifying in a RESULT
table (constraint P1
).RESULT
table is empty or it holds exactly four tuples: one tuple for every combination of POPULATION
and COURSE
(constraint P2
).AVG_SCORE
) of the logic course is always higher than the average score of the set theory course; score A
is the highest, F
the lowest (constraint P3
).P4
).Note Some of these integrity constraints are rather contrived. Their sole purpose in this example is to bring down the number of tables in the table universe of the RESULT
table structure to such an extent that it becomes feasible to enumerate all admissible tables.
In Listing 7-23 you can find these four constraints formally specified as table predicates, using the names P1
through P4
introduced earlier. To be able to compare average scores in these specifications, we introduce a function f
, which is defined as follows:
f := { ('A';6), ('B';5), ('C';4), ('D';3), ('E';2), ('F';1) }
This enables us to compare, for instance, scores B
and E
. Because f(B)=5
and f(E)=2
(and 5>2
), we can say that B
is a higher score than E
.
P1(T) := ( ∀r1,r2∊T: r1↓{POPULATION,COURSE} = r2↓{POPULATION,COURSE} ⇒ r1 = r2 )
P2(T) := ( #T = 0 ∨ #T = 4 )
P3(T) := ( ∀r1,r2∊T: ( r1(POPULATION) = r2(POPULATION) ∧
r1(COURSE) = 'logic' ∧ r2(COURSE) = 'set theory' )
⇒ f(r1(AVG_SCORE)) > f(r2(AVG_SCORE)) )
P4(T) := ¬( ∃r1,r2∊T: r1(POPULATION) = 'NON-DP' ∧ r2(POPULATION) = 'DP' ∧
f(r1(AVG_SCORE)) ≥ f(r2(AVG_SCORE)) )
Table predicate P1
is one of the common types of data integrity predicates that were introduced in the section “Unique Identification Predicate” in Chapter 6.
Table predicate P2
is rather simple; the cardinality of the table should be either zero or four. Here is an alternative way to specify this: #T∊{0,4}
.
As you can see, table predicate P3
specifies that the average score for the logic course should always be higher than the average score for the set theory course within a population. The first conjunct in the universal quantification—r1(POPULATION) = r2(POPULATION)
—specifies this. A user can take for granted that the average scores for logic are always higher than those for set theory, within a given population, but fail to mention this explicitly (as was done in the preceding informal specification) when conveying the requirement informally.
Table predicate P4
unambiguously specifies that the “lower average score” mentioned in the informal specification is meant to be irrespective of the course; there is no conjunct r1(COURSE) = r2(COURSE)
inside the existential quantification.
You can instantiate predicates P1
through P4
using table R1
that was introduced in Figure 7-2. Check for yourself that table R1
violates predicates P1
, P2
, and P3
, and that it satisfies predicate P4
.
P1(R1) = false
P2(R1) = false
P3(R1) = false
P4(R1) = true
With these formal table predicate specifications you can now define the table universe for the RESULT
table structure. Take a look at Listing 7-24, which formally specifies table universe tab_RESULT
using tuple universe tup_RESULT
and table predicates P1
, P2
, P3
, and P4
.
tab_RESULT :=
{ R | R∊℘(tup_RESULT) ∧ P1(R) ∧ P2(R) ∧ P3(R) ∧ P4(R)
}
tab_RESULT
holds every subset of tup_RESULT
that satisfies all four table predicates; obviously table R1
is not an element of tab_RESULT
. The table predicates restrict the powerset of a tuple universe and are therefore referred to as table constraints.
Note If a unique identification predicate constitutes a table constraint (as does P1
in the preceding case), then the set of uniquely identifying attributes is commonly referred to as a key for the given table structure. In this case {POPULATION,COURSE}
is a key for the RESULT
table structure.
Table constraints P1
, P2
, P3
, and P4
are contrived such that they significantly bring down the total number of tables from the original 16384
possible tables generated by the powerset; in fact, only 13 admissible tables remain in this table universe. Listing 7-25 displays an enumerative specification of table universe tab_RESULT
.
tab_RESULT :=
{ ∅
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'B') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'D') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'C') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'B') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'C') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'B') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'C') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'B') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'B') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'B') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'E') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'C') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'C') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'C') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'A') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'E') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'C') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'B') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'C') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'B') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'), (COURSE;'set theory'), (AVG_SCORE;'C') }
, { (POPULATION;'DP'), (COURSE;'logic'), (AVG_SCORE;'B') }
, { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
, { (POPULATION;'NON-DP'), (COURSE;'logic'), (AVG_SCORE;'E') } }
}
This set has 13 elements, each of which is a table. The first one represents the empty table; the other 12 all are sets of 4 tuples. You might want to validate for yourself that all these tables satisfy all four table constraints specified and that there are no other tables (not listed in Listing 7-25) that also satisfy all four table constraints.
In case you have some trouble reading the nested set of sets, Figure 7-3 might clarify what is going on. It shows another way to specify table universe tab_RESULT
using the shorthand notation for tables.
Note In general, table universes hold far too many elements to enumerate them.
Let’s now continue with the specification of our example database design. Take a look at Listing 7-26, which shows the definition of table universe tab_EMP
for the EMP
table structure. This definition formally specifies four table constraints for EMP
. Some informal clarification is embedded as comments.
tab_EMP :=
{ E | E∊ ℘(tup_EMP) ∧
/* EMPNO uniquely identifies an employee tuple */
( ∀e1,e2∊E: e1(EMPNO) = e2(EMPNO) ⇒ e1 = e2 )
∧
/* USERNAME uniquely identifies an employee tuple */
( ∀e1,e2∊E: e1(USERNAME) = e2(USERNAME) ⇒ e1 = e2 )
∧
/* At most one president allowed */
#{ e | e∊E ∧ e(JOB) = 'PRESIDENT' } ≤ 1
/* A department that employs the president or a manager */
/* should also employ at least one administrator */
( ∀d∊E⇓{DEPTNO}:
( ∃e2∊E: e2(DEPTNO) = d(DEPTNO) ∧ e2(JOB) ∊ {'PRESIDENT','MANAGER'} )
⇒
( ∃e3∊E: e3(DEPTNO) = d(DEPTNO) ∧ e3(JOB) = 'ADMIN' )
)
}
The first two constraints are unique identification predicates; the EMPNO
attribute is uniquely identifying in an EMP
table, as is the USERNAME
attribute.
The specification of the third constraint constructs the subset of tuples that have the value PRESIDENT
for the JOB
attribute and restricts the cardinality of this set to 1
at most. This reflects the real-world requirement that there cannot be more than one president.
The last table constraint states that if there exists a PRESIDENT
or a MANAGER
in a given department, then there exists an ADMIN
in the same department. The outer universal quantification supplies all departments for which to perform this check. Note that the informal embedded specification of this constraint might imply that this constraint is a multi-table constraint involving not just the EMP
table structure but also the DEPT
table structure. Take a look at the following database predicate named P
that accepts two parameters: a department table (parameter D
) and an employee table (parameter E
).
P(D,E) :=
( ∀d∊D:
( ∃e2∊E: e2(DEPTNO) = d(DEPTNO) ∧ e2(JOB) ∊ {'PRESIDENT','MANAGER'} )
⇒
( ∃e2∊E: e3(DEPTNO) = d(DEPTNO) ∧ e3(JOB) = 'ADMIN' )
)
This predicate closely follows the informal specification: it quantifies over all departments and states that if the department employs a PRESIDENT
or a MANAGER
, then the department should employ an ADMIN
. Note that only the DEPTNO
attribute of parameter D
is used. The corresponding predicate inside the definition of tab_EMP
quantifies over all available DEPTNO
values inside the EMP
table and then states the same. Because a DEPTNO
attribute is available in the EMP
table structure (and, as we shall see later, it always identifies a valid department), this constraint can be specified as a table constraint.
Caution It is not uncommon that constraints that are, in fact, table constraints are specified as database (multi-table) constraints. This can also happen one phase earlier: tuple constraints that are specified as table constraints. This refers back to the comments we made in Chapter 6 when we mentioned terms such as “tuple predicate in disguise.” In the database design specification methodology introduced in this book, you should always try to specify constraints at the earliest possible phase: attribute value sets first, then tuple constraints, then table constraints, and finally database constraints. Not only will this result in a clear view of your database design, but also will it aid in your effort to implement it (covered in Chapter 11).
Here is an alternative (equivalent) formal specification for the third table constraint of EMP
that might be easier to understand, and clearly shows that there is no need to involve the DEPT
table structure:
( ∀e2∊E: e2(JOB) ∊ {'PRESIDENT','MANAGER'}
⇒
( ∃e3∊E: e3(DEPTNO) = e2(DEPTNO) ∧ e3(JOB) = 'ADMIN' )
)
This predicate states that for all employee tuples that represent a PRESIDENT
or a MANAGER
, there should exist an employee tuple that represents an ADMIN
working in the same department. You probably intuitively feel that this formal specification is equivalent, and therefore represents the same constraint. In fact, it can formally be proven that it is equivalent to the formal specification of this constraint inside tab_EMP
. However, this requires the development of more rewrite rules regarding quantifications, which is beyond the scope of this book.
Listings 7-27, 7-28, and 7-29 define the table universes for the SREP
, MEMP
, and TERM
table structures. As you can see, they all have a unique identification constraint and no other table constraints.
tab_SREP :=
{ S | S∊ ℘(tup_SREP) ∧
/* EMPNO uniquely identifies a tuple */
( ∀s1,s2∊S: s1(EMPNO) = s2(EMPNO) ⇒ s1 = s2 )
}
tab_MEMP :=
{ M | M∊ ℘(tup_MEMP) ∧
/* EMPNO uniquely identifies a tuple */
( ∀m1,m2∊M: m1(EMPNO) = m2(EMPNO) ⇒ m1 = m2 )
}
tab_TERM :=
{ T | T ∈ ℘(tup_TERM) ∧
/* EMPNO uniquely identifies a tuple */
( ∀t1,t2∈ T: t1(EMPNO) = t2(EMPNO) ⇒ t1 = t2 )
}
The table universe for DEPT
(shown in Listing 7-30) introduces two unique identification constraints. Next to {DEPTNO}
, the set of attributes {DNAME,LOC}
is also uniquely identifying in a DEPT
table.
tab_DEPT :=
{ D | D∈ ℘(tab_DEPT) ∧
/* Department number uniquely identifies a tuple */
( ∀d1,d2∈D: d1(DEPTNO) = d2(DEPTNO) ⇒ d1 = d2 )
∧
/* Department name and location uniquely identify a tuple */
( ∀d1,d2∈D:
d1↓{DNAME,LOC} = d2↓{DNAME,LOC} ⇒ d1 = d2 )
∧
/* You cannot manage more than two departments */
( ∀m∈D⇓{MGR}: #{ d | d∈D ∧ d(MGR) = m(MGR) } ≤ 2 )
}
The second unique identification constraint states that there cannot be two departments that have the same name and that are located in the same location; department names are unique within a location. The third table constraint is a formal representation of the constraint that no employee can manage more than two departments. It states that for every department manager, the cardinality of the set of departments managed by him/her is less than or equal to two.
Listing 7-31 shows the table universe definition for the GRD
table structure.
tab_GRD :=
{ G | G∈ ℘(tup_GRD) ∧
/* Salary grade code uniquely identifies a tuple */
( ∀g1,g2∈G: g1(GRADE) = g2(GRADE) ⇒ g1 = g2 )
∧
/* Salary grade lower limit uniquely identifies a tuple */
( ∀g1,g2∈G: g1(LLIMIT) = g2(LLIMIT) ⇒ g1 = g2 )
∧
/* Salary grade upper limit uniquely identifies a tuple */
( ∀g1,g2∈G: g1(ULIMIT) = g2(ULIMIT) ⇒ g1 = g2 )
∧
/* A salary grade overlaps with at most one (lower) grade */
( ∀g1∈G:
( ∃g2∈G: g2(LLIMIT) < g1(LLIMIT) )
⇒
#{ g3 | g3∈G ∧ g3(LLIMIT) < g1(LLIMIT) ∧
g3(ULIMIT) ≥ g1(LLIMIT) ∧
g3(ULIMIT) < g1(ULIMIT) } = 1
)
}
As you can see, this table universe specifies three unique identification constraints for GRD
. The fourth table constraint specifies that there cannot be a salary gap between the ULIMIT
salary and the LLIMIT
salary of two consecutive salary grades. This is done by requiring that for every tuple (say g1
) for which there exists a tuple that corresponds to a lower salary grade (say g2
), there should exist a tuple (say g3
) that corresponds to a lower salary grade and that overlaps with the salary range of tuple g1
. In fact, there should exist exactly one such g3
, thereby covering every salary by at most two salary grades. The formal specification states exactly what is meant and leaves no room for ambiguity.
Listings 7-32 and 7-33 define the table universes for the CRS
and OFFR
table structures. As you can see, only unique identification constraints are involved.
tab_CRS :=
{ C | C∈℘(tup_CRS) ∧
/* Course code uniquely identifies a tuple */
( ∀c1,c2∈C: c1(CODE) = c2(CODE) ⇒ c1 = c2 )
}
tab_OFFR :=
{ O | O∈℘(tup_OFFR) ∧
/* Course code and begin date uniquely identify a tuple */
( ∀o1,o2∈O:
o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ⇒ o1 = o2 )
∧
/* Begin date and (known) trainer uniquely identify a tuple */
( ∀o1,o2∈{ o | o∈O ∧ o(TRAINER) ≠ -1 }:
o1↓{STARTS,TRAINER} = o2↓{STARTS,TRAINER} ⇒ o1 = o2 )
}
The second table constraint is sort of a special case of a unique identification constraint. The set {STARTS,TRAINER}
is uniquely identifying within a subset of an OFFR
table. This constraint states that a trainer cannot engage more than one course offering per start date. There can certainly be multiple course offerings starting at the same date that have no trainer assigned to them yet; hence the exclusion of offerings with no assigned trainer in the universal quantification.
Note There is obviously a wider limitation here that involves the full duration of an offering: trainers cannot engage another offering for the duration of an assigned offering. This involves the duration of the corresponding course (attribute DUR
in the CRS
table structure). Therefore, it is a multi-table constraint and will be specified in the next phase (database universe).
Listing 7-34 defines table universe tab_REG
for the REG
table structure.
tab_REG :=
{ R | R∈ ℘(tup_REG) ∧
/* Attendee and begin date uniquely identify a tuple */
( ∀r1,r2∈R:
r1↓{STARTS,STUD} = r2↓{STARTS,STUD} ⇒ r1 = r2 )
∧
/* Offering is evaluated by all attendees, or it is too early to */
/* evaluate the offering */
( ∀r1,r2∈R:
( r1↓{COURSE,STARTS} = r2↓{COURSE,STARTS} )
⇒
( ( r1(EVAL) = -1 ∧ r2(EVAL) = -1 ) ∨
( r1(EVAL) ≠ -1 ∧ r2(EVAL) ≠ -1 )
) )
}
Do you notice that the first table constraint does not state {COURSE,STARTS,STUD}
is uniquely identifying with a REG
table? Considering that REG
is a “child table” of CRS
, you would typically expect COURSE
to be part of the set of attributes. However, in the real world we don’t allow students to register for two offerings that start at the same date, whatever courses are involved in these offerings. Therefore, the reduced set {STARTS,STUD}
is uniquely identifying in a REG
table. Here too there is a wider limitation involving the duration of a course that will be specified in the definition of the database universe later on.
The second table constraint states that within the registrations of a course offering, either all evaluations (still) hold value -1
, or all of them hold a value that differs from -1
. Note that (quoting from the embedded comment) “evaluated by all attendees” includes the special value 0.
The idea here is that near the end of the course offering, the trainer explicitly changes—in a single transaction—all evaluation values from -1
(too early to evaluate) to 0
(not evaluated). The students then evaluate the offering individually, each in their own transaction. As you’ll see in the next chapter, you can formally specify this intended behavior through a dynamic (state transition) constraint.
Listing 7-35 defines the last table universe (tab_HIST
) regarding the HIST
table structure.
tab_HIST :=
{ H | H∈ ℘(tup_HIST) ∧
/* Employee number and history end date uniquely identify a tuple */
( ∀h1,h2∈H: h1↓{EMPNO,UNTIL} = h2↓{EMPNO,UNTIL} ⇒ h1 = h2 )
∧
/* Either department number or monthly salary (or both) */
/* must have changed between two consecutive history records */
( ∀h1,h2∈H:
( h1(EMPNO) = h2(EMPNO) ∧
h1(UNTIL) < h2(UNTIL) ∧
¬ ( ∃h3∈T: h3(EMPNO) = h1(EMPNO) ∧
h3(UNTIL) > h1(UNTIL) ∧
h3(UNTIL) < h2(UNTIL) )
) ⇒
( h1(MSAL) ≠ h2(MSAL) ∨ h1(DEPTNO) ≠ h2(DEPTNO) )
)
}
The first (unique identification) table constraint states that there can be at most one history record per employee per date. The second table constraint states that two consecutive history records (for the same employee) cannot have both an unchanged monthly salary and an unchanged department number; at least one of these attributes must have a different value. The way the table predicate specifies this is by universally quantifying over all pairs of HIST
tuples (h1
and h2
); if such a pair involves two consecutive HIST
tuples for the same employee, then either the MSAL
or the DEPTNO
attribute values must differ. The fact that two HIST
tuples are consecutive is specified by the existential quantification: there should not exist another (third) HIST
tuple (h3
) for the same employee that “sits in between” the other two tuples.
Again, the formal specification for this constraint leaves no room for ambiguity.
This concludes the treatment of the table universes of the example database design. The next section moves on the definition of the database universe. This definition will build on the table universes and will formally specify the database constraints.
You have now reached the last stepping stone (phase) of this book’s formal methodology to specify a database design. As with the other phases, it continues to build on the sets specified in the previous phase.
You can use the table universes to build a set of admissible database states (we’ll demonstrate this shortly). Such a set is called a database universe. Every element in the database universe is an admissible database state for the database design.
At the beginning of this chapter, we said, “Every database state is an admissible set of n
tables (one per table structure).” Of course, this was loosely speaking. The way you formally represent a database state was introduced in Chapter 5; it is a function where every first element of a pair is a table structure name and every second element of a pair is a table. As you’ll see, this way of representing a database state is convenient; we can easily construct functions such as this using the already defined table universes.
Let’s illustrate this with the RESULT
table structure that was used in the previous sections. Because this section is about database constraints, we first introduce a second table structure to go along with the RESULT
table structure. Let’s assume the existence of a LIMIT
table structure. Here’s the external predicate for LIMIT
: “Average score SCORE
is not allowed for population POPULATION
.” The idea is that the tuples in LIMIT
limit the admissible tables for RESULT
. We’ll specify precisely how they do so later on, by a database constraint.
Listing 7-36 specifies characterization chr_LIMIT
, tuple universe tup_LIMIT
, and table universe tab_LIMIT
.
chr_LIMIT :=
{ ( POPULATION; {'DP','NON-DP'} )
, ( SCORE; {'A','F'} )
}
tup_LIMIT :=
{ l | l∈Π(chr_LIMIT) ∧
l(POPULATION) = 'DP' ⇒ l(SCORE) = 'A' ∧
l(POPULATION) = 'NON-DP' ⇒ l(SCORE) = 'F'
}
tab_LIMIT :=
{ L | L∈℘(tup_LIMIT) ∧
( ∀l1,l2∈L: l1(POPULATION) = l2(POPULATION) ⇒ l1 = l2 )
}
The tuple and table universes for LIMIT
are rather small sets. Here is an enumerative specification for both of them:
tup_LIMIT = { { (POPULATION;'DP') , (SCORE;'A') }
, { (POPULATION;'NON-DP'), (SCORE;'F') } }
tab_LIMIT = { Ø
, { { (POPULATION;'DP') , (SCORE;'A') } }
, { { (POPULATION;'NON-DP'), (SCORE;'F') } }
, { { (POPULATION;'DP') , (SCORE;'A') }
, { (POPULATION;'NON-DP'), (SCORE;'F') } }
}
As you can see, there are only four admissible LIMIT
tables: the empty table, two tables with one tuple, and one table with two tuples.
Note The unique identification table constraint in the specification of tab_LIMIT
is in fact redundant. It is implied by the tuple constraints and the characterization.
Figure 7-4 displays tab_LIMIT
using the shorthand notation for tables.
Given these two table structures, we can now take a look at a database state. Figure 7-5 displays a database state, named DBS1
, that involves the RESULT
and LIMIT
table structures.
As you can see, it is a function that holds two pairs. The first ordered pair holds—as the second coordinate—an admissible RESULT
table (that is, an element of tab_RESULT
). Likewise, the second ordered pair holds an admissible LIMIT
table (that is, an element of tab_LIMIT
). These tables are accessible via their table structure names: the values appearing as the first coordinates, RESULT
and LIMIT
. Using the definition of DBS1
, you can write expressions such as DBS1(RESULT)
and DBS1(LIMIT)
, which yield the displayed RESULT
and LIMIT
tables, respectively.
Now take a look at the following set function named DBCHR:
DBCHR := { (RESULT; tab_RESULT), (LIMIT; tab_LIMIT) }
Set function DBCHR
is called a database characterization. It characterizes a database in that it introduces the names of the table structures that are involved in a database design and attaches to them the relevant table universes (the sets that hold the admissible tables for each table structure).
Note Database state DBS1
is an element of Π(DBCHR)
. DBS1
has the same domain as function DBCHR
, and every second coordinate of DBS1
is an element chosen from the respective set (tab_RESULT
and tab_LIMIT
) that appears as the second coordinate in DBCHR
.
You can construct a set that holds all possible database states for the result/limit database design, by taking the generalized product of set function DBCHR:
DB_U1 := { dbs | dbs∈Π(DBCHR) }
In set DB_U1
, every element dbs
is a possible database state given the table universes tab_RESULT
and tab_LIMIT
. The generalized product operator generates every possible combination of a RESULT table and a LIMIT table; there are 52 combinations in total (#tab_RESULT times #tab_LIMIT).
Now, finally database constraints enter the picture. You can restrict set DB_U1
by adding database predicates inside the specification. Take a look at Listing 7-37, which introduces set DB_U2
.
DB_U2 :=
{ dbs | dbs∈Π(DBCHR)∧
( ∀r∈db(RESULT),l∈db(LIMIT):
r(POPULATION) = l(POPULATION) ⇒ r(AVG_SCORE) ≠ l(SCORE) ) }
The specification of DB_U2
contains one database predicate. It states that if there is a limit regarding database professionals, then there cannot be a result with an average score of A
for database professionals (regardless of the course), and if there is a limit regarding non-database professionals, then there cannot be a result with an average score of F
for non-database professionals.
This database predicate discards possible database states that were generated by the generalized product operator, but that do not reflect admissible representations of our real world. Database predicates that constrain the elements in a database universe are called database constraints.
Note Given the database constraint of universe DB_U2
, state DBS1
shown in Figure 7-5 is not an admissible state (that is, DB_U2∉DBS1
); the RESULT
table holds a tuple that has an average score of A
for database professionals. This is not allowed given the contents of the LIMIT
table in database state DBS1
. In universe DB_U2
, the database constraint will discard a total of 26 (possible) database states. You might want to verify this for yourself.
We’ll often specify the database constraints in a slightly different manner. Instead of specifying a database predicate with one parameter of type database state, we’ll specify them with two or more parameters of type table. To illustrate this, here is an alternative manner of defining database universe DB_U2
:
DB_U2 := { dbs | dbs∈
Π(DBCHR) ∧ PDC1(dbs(RESULT),dbs(LIMIT)) }
Inside this definition, we instantiate a predicate named PDC1
that accepts two tables as its arguments. It is defined as follows (R
and L
represent parameters of type table):
PDC1(R,L) := ( ∀r∈R,l∈L:
r(POPULATION) = l(POPULATION) ⇒ r(AVG_SCORE) ≠ l(SCORE)
)
These two definitions together form an equivalent specification to the one given in Listing 7-37. This way of specifying shows immediately how many tables are involved in each database constraint, and inside the database predicate you can use the table parameters directly instead of having to invoke the database state with a certain table structure name.
Let’s now continue with the example database design and define its database universe, thereby formally specifying all database constraints. We start by defining the database characterization. Take a look at Listing 7-38, which holds the definition for DB_CHREX
.
DB_CHREX :=
{ ( EMP; tab_EMP )
, ( SREP; tab_SREP )
, ( MEMP; tab_MEMP )
, ( TERM; tab_TERM )
, ( DEPT; tab_DEPT )
, ( GRD; tab_GRD )
, ( CRS; tab_CRS )
, ( OFFR; tab_OFFR )
, ( REG; tab_REG )
, ( HIST; tab_HIST )
}
Set function DB_CHREX
introduces ten table aliases and attaches to each of them the relevant table universe that was specified in the previous phase. The generalized product of DB_CHREX
will produce a set with many possible database states. Our database universe DB_UEX
, which is defined in Listing 7-39, establishes 36 database constraints. They are specified by name only in this definition. In the sections that follow, you’ll find the formal specification for each of these named constraints.
DB_UEX :=
{ dbs | dbs∈Π(DB_CHREX) ∧
/* ======================================= */
/* Start of Subset Requirement Constraints */
/* ======================================= */
PSSR1(dbs(EMP),dbs(DEPT)) ∧ PSSR2(dbs(DEPT),dbs(EMP)) ∧
PSSR3(dbs(MEMP),dbs(EMP)) ∧ PSSR4(dbs(TERM),dbs(EMP)) ∧
PSSR5(dbs(EMP),dbs(GRD)) ∧ PSSR6(dbs(OFFR),dbs(CRS)) ∧
PSSR7(dbs(OFFR),dbs(DEPT)) ∧ PSSR8(dbs(OFFR),dbs(EMP)) ∧
PSSR9(dbs(REG),dbs(EMP)) ∧ PSSR10(dbs(REG),dbs(OFFR)) ∧
PSSR11(dbs(HIST),dbs(EMP)) ∧ PSSR12(dbs(HIST),dbs(DEPT)) ∧
/* =================================== */
/* Start of Specialization Constraints */
/* =================================== */
PSPEC1(dbs(EMP),dbs(SREP)) ∧ PSPEC2(dbs(EMP),dbs(MEMP)) ∧
/* ================================== */
/* Start of Tuple-in-Join Constraints */
/* ================================== */
PTIJ1(dbs(EMP),dbs(GRD)) ∧ PTIJ2(dbs(EMP),dbs(TERM)) ∧
PTIJ3(dbs(SREP),dbs(EMP),dbs(MEMP)) ∧ PTIJ4(dbs(EMP),dbs(MEMP)) ∧
PTIJ5(dbs(EMP),dbs(DEPT)) ∧ PTIJ6(dbs(EMP),dbs(HIST)) ∧
PTIJ7(dbs(TERM),dbs(HIST)) ∧ PTIJ8(dbs(EMP),dbs(REG)) ∧
PTIJ9(dbs(TERM),dbs(REG),dbs(CRS)) ∧
PTIJ10(dbs(EMP),dbs(REG),dbs(OFFR),dbs(CRS)) ∧
PTIJ11(dbs(EMP),dbs(OFFR)) ∧ PTIJ12(dbs(TERM),dbs(OFFR),dbs(CRS)) ∧
PTIJ13(dbs(REG),dbs(OFFR)) ∧ PTIJ14(dbs(OFFR),dbs(CRS)) ∧
PTIJ15(dbs(EMP),dbs(REG),dbs(OFFR),dbs(CRS)) ∧
/* =================================== */
/* Start of Other Database Constraints */
/* =================================== */
PODC1(dbs(TERM),dbs(MEMP)) ∧ PODC2(dbs(TERM),dbs(DEPT)) ∧
PODC3(dbs(OFFR),dbs(DEPT),dbs(EMP),dbs(CRS)) ∧ PODC4(dbs(OFFR),dbs(REG)) ∧
PODC5(dbs(OFFR),dbs(REG)) ∧ PODC6(dbs(OFFR),dbs(REG)) ∧
PODC7(dbs(OFFR),dbs(REG),dbs(EMP))
}
As you’ll see, the majority of the database constraints are based on one of the common types of predicates that were introduced in the section “Common Patterns of Table and Database Predicates” in Chapter 6; there are 12 subset requirement, two specialization, and 15 tuple-in-join constraints in this database universe definition. Finally, there are seven “other” (that is, not of a common type) database constraints. Most database constraints only involve a pair of table structures; however, some tuple-in-join and “other” database constraints involve more than two table structures.
In the remaining part of this section, you’ll find the formal specification of all database constraints, whose names were introduced in the definition of DB_UEX
. For our convenience, the names for the table parameters in each of these predicates are the same as the names of the relevant table structures that are involved in the predicate. As you can see inside the specification of DB_UEX
, the database predicates are instantiated by providing the relevant tables as the arguments.
Listing 7-40 shows database predicates PSSR1
through PSSR12
, which represent the subset requirements in the example database design.
PSSR1(EMP,DEPT) :=
/* Employee works for a known department */
{ e(DEPTNO) | e∈EMP } ⊆ { d(DEPTNO) | d∈DEPT }
PSSR2(DEPT,EMP) :=
/* Dept mgr is a known employee, excluding admins and president */
{ d(MGR) | d∈DEPT } ⊆
{ e(EMPNO) | e∈EMP ∧ e(JOB) ∉ {'ADMIN','PRESIDENT'} }
PSSR3(MEMP,EMP) :=
/* Employees can report to a president or a manager only */
{ m(MGR) | m∈MEMP } ⊆
{ e(EMPNO) | e∈EMP ∧ e(JOB) ∈ {'PRESIDENT','MANAGER' } }
PSSR4(TERM,EMP) :=
/* A termination is for a known employee; not everyone has left */
{ t(EMPNO) | t∈TERM } ⊂ { e(EMPNO) | e∈EMP }
PSSR5(EMP,GRD) :=
/* Employee has a known salary grade */
{ e(SGRADE) | e∈EMP } ⊆ { g(GRADE) | g∈GRD }
PSSR6(OFFR,CRS) :=
/* Course offering is for a known course */
{ o(COURSE) | o∈OFFR } ⊆ { c(CODE) | c∈CRS }
PSSR7(OFFR,DEPT) :=
/* Courses take place in locations where we have a department */
{ o(LOC) | o∈OFFR } ⊆ { d(LOC) | d∈DEPT }
PSSR8(OFFR,EMP) :=
/* Trainer of course offering is a known trainer */
{ o(TRAINER) | o∈OFFR ∧ o(TRAINER) ≠ -1 } ⊆
{ e(EMPNO) | e∈EMP ∧ e(JOB) = 'TRAINER' }
PSSR9(REG,EMP) :=
/* Course registration is for a known employee */
{ r(STUD) | r∈REG } ⊆{ e(EMPNO) | e∈EMP }
PSSR10(REG,OFFR) :=
/* Course registration is for a known course offering */
{ r↓{COURSE,STARTS} | r∈REG } ⊆ o↓{COURSE,STARTS} | o∈OFFR }
PSSR11(HIST,EMP) :=
/* History record is for a known employee */
{ h(EMPNO) | h∈HIST } ⊆ { e(EMPNO)} | e∈EMP }
PSSR12(HIST,DEPT) :=
/* History record is for a known department */
{ h(DEPTNO) | h∈HIST } ⊆ { d(DEPTNO)} | d∈DEPT }
Most subset requirements are straightforward; PSSR1
, 5
, 6
, 9
, 10
, 11
, and 12
all state that every tuple in the first argument is related to some tuple in the second argument.
You’ll probably find PSSR7
a bit unusual. It states that {LOC}
in OFFR
references {LOC}
in DEPT
. Usually, given such a subset requirement, {LOC}
would be uniquely identifying in DEPT
; however, this is not the case here. This means that an OFFR
tuple might be related to more than one DEPT
tuple.
Note Apparently the example database design does not need to hold information with regards to locations, other than the name of a location. If that were the case, then a separate table structure, say with alias LOC
, would have been introduced to hold location information. Also, two subset requirements, one from DEPT
to LOC
and one from OFFR
to LOC
, would have been introduced.
Predicates PSSR2
and PSSR3
state that every tuple in the first parameter value is related to some tuple in a subset of the second parameter value; the set specified at the right-hand side denotes a table restriction in these cases.
Predicate PSSR8
states that every tuple in a subset of the first parameter value is related to some tuple in a subset of the second parameter value; the sets specified at both sides denote table restrictions.
Finally, you might have noticed that inside predicate PSSR4
the proper subset operator is used; this ensures that not every employee has been terminated (at least one employee still works for the company).
You can find the definitions for the specialization constraints PSPEC1
and PSPEC2
in Listing 7-41.
PSPEC1(EMP,SREP) :=
/* Salesreps have a target and a commission */
{ e(EMPNO) | e∈EMP ∧ e(JOB) = 'SALESREP' } =
{ s(EMPNO) | s∈SREP }
PSPEC2(EMP,MEMP) :=
/* Everybody, excluding presidents, is a managed employee */
{ e(EMPNO) | e∈EMP ∧ e(JOB) ≠ 'PRESIDENT' } =
{ m(EMPNO) | m∈MEMP }
Predicate PSPEC1
specifies that you can find additional information for sales representatives (and no other employees) in the SREP
table structure; only sales representatives have a target and a commission. Predicate PSPEC2
specifies that for all employees excluding presidents, additional information can be found in the MEMP
table structure; here the employee number of the employee that acts as the manager is maintained.
Note We say that SREP
and MEMP
are specializations of EMP
.
Listing 7-42 shows database predicates PTIJ1
through PTIJ15
that represent the tuple-in-join constraints in the example database design.
PTIJ1(EMP,GRD) :=
/* Monthly salary must fall within assigned salary grade */
( ∀e∈EMP, g∈GRD:
e(SGRADE) = g(GRADE) ⇒ ( g(LLIMIT) ≤ e(MSAL) ∧ e(MSAL) ≤ g(ULIMIT) ) )
PTIJ2(EMP,TERM) :=
/* Leave date must fall after hire date */
( ∀e∈EMP, t∈TERM:
e(EMPNO) = t(EMPNO) ⇒ e(HIRED) < t(LEFT) )
PTIJ3(SREP,EMP,MEMP) :=
/* Salesreps cannot earn more than the employee they report to */
( ∀s∈SREP, es∈EMP, m∈MEMP, em∈EMP:
( s(EMPNO)=es(EMPNO) ∧ es(EMPNO)=m(EMPNO) ∧ m(MGR) = em(EMPNO) )
⇒
( es(MSAL) + s(COMM)/12 < em(MSAL) ) )
PTIJ4(EMP,MEMP) :=
/* Non-salesreps cannot earn more than the employee they report to */
( ∀e∈EMP, m∈MEMP, em∈EMP:
(e(EMPNO)=m(EMPNO) ∧ m(MGR) = em(EMPNO) ∧ e(JOB) ≠ 'SALESREP')
⇒
( e(MSAL) < em(MSAL) ) )
PTIJ5(EMP,DEPT) :=
/* Department manager must work for department he/she manages */
( ∀e∈EMP, d∈DEPT:
e(EMPNO) = d(MGR) ⇒ e(DEPTNO) = d(DEPTNO) )
PTIJ6(EMP,HIST) :=
/* No history records allowed at or before hire date */
( ∀e∈EMP, h∈HIST:
e(EMPNO) = h(EMPNO) ⇒ e(HIRED) < h(UNTIL) )
PTIJ7(TERM,HIST) :=
/* No history records allowed at or after leave date */
( ∀t∈TERM, h∈HIST:
t(EMPNO) = h(EMPNO) ⇒ t(LEFT) > h(UNTIL) )
PTIJ8(EMP,REG) :=
/* You cannot register for offerings in 1st four weeks on the job */
( ∀e∈EMP, r∈REG:
e(EMPNO) = r(STUD) ⇒ e(HIRED) + 28 ≤ r(STARTS) )
PTIJ9(TERM,REG,CRS) :=
/* You cannot register for offerings given at or after leave date */
( ∀t∈TERM, r∈REG, c∈CRS:
( t(EMPNO) = r(STUD) ∧ r(COURSE) = c(CODE) )
⇒
( t(LEFT) ≥ r(STARTS) + c(DUR) ) )
PTIJ10(EMP,REG,OFFR,CRS) :=
/* You cannot register for overlapping course offerings */
( ∀e∈EMP, r1,r2∈REG, o1,o2∈OFFR, c1,c2∈CRS:
( e(EMPNO) = r1(STUD) ∧
r1↓{COURSE,STARTS} = o1↓{COURSE,STARTS} ∧
o1(COURSE) = c1(CODE) ∧
e(EMPNO) = r2(STUD) ∧
r2↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∧
o2(COURSE) = c2(CODE)
) ⇒
( o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
o1(STARTS) ≥ o2(STARTS) + c2(DUR) ∨
o2(STARTS) ≥ o1(STARTS) + c1(DUR) ) )
PTIJ11(EMP,OFFR) :=
/* Trainer cannot teach courses before hire date */
( ∀e∈EMP, o∈OFFR:
e(EMPNO) = o(TRAINER) ⇒ e(HIRED) ≤ o(STARTS) )
PTIJ12(TERM,OFFR,CRS) :=
/* Trainer cannot teach courses at or after leave date */
( ∀t∈TERM, o∈OFFR, c∈CRS:
( t(EMPNO) = o(TRAINER) ∧ o(COURSE) = c(CODE) )
⇒
( t(LEFT) ≥ o(STARTS) + c(DUR) ) )
PTIJ13(REG,OFFR) :=
/* Trainer cannot register for offerings taught by him/herself */
( ∀r∈REG, o∈OFFR:
r↓{COURSE,STARTS} = o↓{COURSE,STARTS} ⇒
r(STUD) ≠ o(TRAINER) )
PTIJ14(OFFR,CRS) :=
/* Trainer cannot teach different courses simultaneously */
( ∀o1,o2∈OFFR, c1,c2∈CRS:
( o1(TRAINER) = o2(TRAINER) ∧
o1(COURSE) = c1(CODE) ∧
o2(COURSE) = c2(CODE)
) ⇒
( o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
o1(STARTS) ≥ o2(STARTS) + c2(DUR) ∨
o2(STARTS) ≥ o1(STARTS) + c1(DUR) ) )
PTIJ15(EMP,REG,OFFR,CRS) :=
/* Employee cannot register for course offerings that overlap */
/* with another course offering where he/she is the trainer */
( ∀e∈EMP, r∈REG, o1,o2∈OFFR, c1,c2∈CRS:
( e(EMPNO) = r(STUD) ∧
r↓{COURSE,STARTS} = o1↓{COURSE,STARTS} ∧
o1(COURSE) = c1(CODE) ∧
e(EMPNO) = o2(TRAINER) ∧
o2(COURSE) = c2(CODE)
) ⇒
( o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
o1(STARTS) ≥ o2(STARTS) + c2(DUR) ∨
o2(STARTS) ≥ o1(STARTS) + c1(DUR) ) )
Predicates PTIJ1
to PIJ15
all limit the tuples that are allowed in some join. Most of these joins involve only two table structures. Five predicates (PTIJ3
, PTIJ9
, PTIJ10
, PTIJ12
, PTIJ15
) limit the tuples in joins that involve more than two table structures. Let’s take a closer look at those five.
Predicate PTIJ3
joins an SREP
tuple (variable s
) with its related EMP
tuple (variable es
), continues to join this EMP
tuple with its related MEMP
tuple (variable m
), and finally joins this MEMP
tuple with its related EMP
tuple (variable em
). All tuples that are in this join across four tables (note that EMP
is joined twice) are constrained by the predicate es(MSAL) + s(COMM)/12 < em(MSAL)
. This states that the monthly salary of the sales representative, increased with a twelfth of his/her commission, should be less than the monthly salary of the employee that is managing this sales representative. By the way, this managing employee cannot be a sales representative—which might have required adding a twelfth of a commission at the right-hand side of the smaller-than symbol too—due to subset requirement PSSR3
.
Predicate PTIJ9
joins a TERM
tuple (t
) directly with a REG
tuple (r
) and continues to join the REG
tuple with its related CRS
tuple (c
).
Note A few notes:
The join from TERM
to REG
is not done “via EMP
.” Because TERM
is a specialization of EMP
, you can join TERM
directly to any other table structure that EMP
can join to using the EMPNO
attribute.
The join from REG
to CRS
is not done “via OFFR
.” From the two subset requirements PSSR6
and PSSR10
, you can deduce that {CRS}
in REG
references {CODE}
in CRS
(a subset requirement between REG
and CRS
), and you can thus join REG
directly to CRS
in the way shown.
Tuples in this join are constrained by the predicate t(LEFT) ≥ r
(STARTS) + c(DUR)
. Do you notice that this formal specification somewhat differs from the embedded informal comment? The formal specification involves the CRS
table structure too, or rather its DUR
attribute. Here’s a more correct informal explanation of this constraint: an employee cannot register for a course offering that starts at or after his or her leave date, nor can an employee be terminated during a course offering that he or she is attending as a student.
Now you’re going down a path that is often hard to avoid when you try to describe database constraints informally: you start talking in terms of transactions that are not allowed given the constraint. This is what just happened earlier: you cannot insert an OFFR
tuple given a certain TERM
tuple, and vice versa, you cannot insert a TERM
tuple given a certain OFFR
tuple. With the increasing scope of data covered by a constraint, it becomes increasingly more difficult to informally—using a natural language—state the what of a constraint without alienating yourself from your customer. You often must dive into the how of a constraint. In your communication with the end users, this is not a bad thing; you start communicating in terms of how the system will behave (what transactions it will not accept given the constraint), and end users typically understand this. Your formal specification is the basis from which you can generate this behavior (we will deal with this in detail in Chapter 11). For instance, you might want to briefly mention to the user that given this constraint, the duration of a course cannot be increased (updated) given certain circumstances.
Predicate PTIJ10
joins an EMP
tuple with two related REG
tuples (r1
and r2
), continues to join each of these two REG
tuples with their related OFFR
tuple (o1
and o2
), and finally joins the two OFFR
tuples with their related CRS
tuple (c1
and c2
). Tuples in this join are constrained by the following predicate:
o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
o1(STARTS) ≥ o2(STARTS) + c2(DUR) ∨
o2(STARTS) ≥ o1(STARTS) + c1(DUR)
This predicate states that if the join doesn’t involve the same two OFFR
tuples, then offering o1
must start after offering o2
ends, or vice versa.
Predicate PTIJ12
is the “trainer” version of PTIJ9
. Instead of joining TERM
via REG
(using the STUD
attribute) to CRS
, predicate PTIJ12
joins TERM
via OFFR
(using the TRAINER
attribute) to CRS
and limits the resulting tuples in the join in the same way.
Predicate PTIJ15
joins an EMP
tuple to a CRS
tuple via REG
and OFFR
(representing the student that attends a course offering), and continues to join that same EMP
tuple to a CRS
tuple via OFFR
(representing the trainer that gives an offering). It then restricts the tuples in this join in the same way as predicate PTIJ10
does.
You might have noticed a correlation between predicates PTIJ15
and PTIJ13
; we will deal with this in an exercise at the end of this chapter.
Listing 7-43 shows database predicates PODC1
through PODC7
, which represent the remaining seven other database constraints in the example database design.
PODC1(TERM,MEMP) :=
/* Active employee cannot be managed by terminated employee */
{ t1(EMPNO) | t1∈TERM } ∩
{ m(MGR) | m∈MEMP ∧
¬ ( ∃t2∈TERM: t2(EMPNO) = m(EMPNO) ) } = Ø
PODC2(TERM,DEPT) :=
/* Department cannot be managed by a terminated employee */
{ t(EMPNO) | t∈TERM } ∩ { d(MGR) | d∈DEPT } = Ø
PODC3(OFFR,DEPT,EMP,CRS) :=
/* At least half of the course offerings taught by a trainer */
/* must be at home base */
( ∀e1∈{ o1(TRAINER) | o1∈OFFR ∧ o1(STATUS) ≠ 'CANC' }:
( Σt∈{ o2∪c2| d2∈DEPT ∧ e2∈EMP ∧ o2∈OFFR ∧ c2∈CRS ∧
e2(EMPNO) = e1 ∧
e2(EMPNO) = o2(TRAINER) ∧
e2(DEPTNO) = d2(DEPTNO) ∧
o2(COURSE) = c2(CODE) ∧
o2(STATUS) ≠ 'CANC' ∧
o2(LOC) = d2(LOC)
} : t(DUR)
) ≥
( Σt∈{ o3∪c3| d3∈DEPT ∧ e3∈EMP ∧ o3∈OFFR ∧ c3∈CRS ∧
e3(EMPNO) = e1 ∧
e3(EMPNO) = o3(TRAINER) ∧
e3(DEPTNO) = d3(DEPTNO) ∧
o3(COURSE) = c3(CODE) ∧
o3(STATUS) ≠ 'CANC' ∧
o3(LOC) ≠ d3(LOC)
} : t(DUR) ) )
PODC4(OFFR,REG) :=
/* Offerings with 6+ registrations must have status confirmed */
( ∀o∈OFFR:
#{ r | r∈REG ∧
r↓{COURSE,STARTS} = o↓{COURSE,STARTS} } ≥ 6
⇒
o(STATUS) = 'CONF' )
PODC5(OFFR,REG) :=
/* Number of registrations cannot exceed maximum capacity of offering */
( ∀o∈OFFR:
#{ r | r∈REG ∧
r↓{COURSE,STARTS} = o↓{COURSE,STARTS} } ≤ o(MAXCAP) )
PODC6(OFFR,REG) :=
/* Canceled offerings cannot have registrations */
( ∀o∈OFFR: o(STATUS) = 'CANC'
⇒
¬( ∃r∈REG: r↓{COURSE,STARTS} = o↓{COURSE,STARTS} ) )
PODC7(OFFR,REG,EMP) :=
/* You are allowed to teach a certain course only if: */
/* 1. You have been employed for at least one year, or */
/* 2. You have attended that course first and the trainer of that */
/* course offering attends your first teach as participant */
( ∀o1∈OFFR:
/* If this is the 1st time this trainer gives this course ... */
( ¬∃o2∈OFFR:
o1↓{COURSE,TRAINER} = o2↓{COURSE,TRAINER} ∧
o2(STARTS) < o1(STARTS)
) ⇒
(/* then there should be an attendee in the classroom ... */
( ∃r1∈REG:
r1↓{COURSE,STARTS} = o1↓{COURSE,STARTS} ∧
/* who has given this course at an earlier date ... */
( ∃o3∈OFFR:
o3(TRAINER) = r1(STUD) ∧
o3(COURSE) = o1(COURSE) ∧
o3(STARTS) < o1(STARTS) ∧
/* and *that* course was attended by the current trainer */
( ∃r2∈REG:
o3↓{COURSE,STARTS} = r2↓{COURSE,STARTS} ∧
r2(STUD) = o1(TRAINER) )
) ) ) ∨
/* or, this trainer has been employed for at least one year */
( ↵{ e(HIRED) | e∈EMP ∧ e(EMPNO) = o1(TRAINER) } <
o1(STARTS) - 365 ) ) )
Predicate PODC1
states that the intersection of two sets should be the empty set. The first set holds all employee numbers of terminated employees. The second set holds all employee numbers of employees who manage an active (that is, not terminated) employee. By specifying that the intersection of these two sets is empty, you represent the informal requirement that an active employee cannot be managed by a terminated employee.
In the same way, predicate PODC2
represents the informal requirement that a department is managed by an active employee.
Predicate PODC3
universally quantifies over all employee numbers of employees that act as the trainer of some offering that has not been canceled. For each such employee number, it builds the set of (not canceled) OFFR
tuples joined with their related CRS
tuple, where the trainer of the offering equals the given employee (number), and the location of the offering is equal to the location of the department in which the given employee works. It then sums the DUR
attribute values of all tuples in this set. The (summed) duration of offerings for the given trainer where the offering was given—at a location that differs from the trainer’s department location—is determined in the same way. For every given trainer, the former sum cannot be less than the latter sum.
Of course, this is not the way you communicate with your users. You stick to the English sentence provided by them (see the embedded comments in the code) and verify the meaning of the terminology that they use in that sentence. The verb “taught” in “taught by a trainer” clearly means that the offering must have status scheduled or confirmed; canceled offerings are not to be considered. It appears that counting offerings involves the duration of the offering. Finally, an offering is considered to be “at home base” if its location is the same as the location of the department that employs the trainer.
Note Constraint PODC3
requires that attribute LOC
of the OFFR
table structure can (meaningfully) be compared to attribute LOC
of the DEPT
table structure. Constraint PSSR7
has been introduced to ensure exactly this requirement.
Predicate PODC4
states that offerings for which six or more students have registered must have status confirmed. It does this by inspecting the cardinality of the set that holds all registrations that are related to a given offering. If this cardinality is six or more, then the status of the given offering must be confirmed.
The structure of predicate PODC5
is similar to the structure of PODC6
. It universally quantifies over all offerings, and requires that the number of related registrations for the given offering cannot exceed the maximum capacity of the given offering.
Note The combination of predicates PODC4
and PODC5
imply that it would be useless to allow an offering to have a maximum capacity of less than six; such an offering could never reach status confirmed. This is exactly why the attribute value set of attribute MAXCAP
starts at value six (see Listing 7-9).
Predicate PODC6
states that registrations for canceled offerings cannot exist. Consequently, whenever an offering gets canceled, all related registrations are to be removed.
Predicate PODC7
constitutes the pièce de résistance of our example database universe. It requires that the first-time offering of every course taught by a trainer satisfies a certain condition. If the given trainer of such an offering has not been employed for at least one year, then this given trainer must have attended a prior offering of the same course, and the trainer of that prior course offering must attend such a first-time offering taught by a given trainer. Predicate PODC7
explicitly holds additional comments to guide you through its formal specification.
When you try to read and understand a complex predicate such as PODC7
, it’s useful to first study the top-down structure of the predicate. In this case, the structure of the predicate is as follows:
( ∀o1∈OFFR: P1(o1,OFFR) ⇒ ( P2(o1,OFFR,REG) ∨ P3(o1,EMP) )
Predicate PODC7
is a universal quantification over all offerings, and states that if some condition (P1
) that involves the given offering and other offerings is TRUE
, then a disjunction of two conditions (P2
and P3
) should hold. Condition P1
states that the given offering is the first-time offering. Condition P2
involves the given offering, other offerings, and registrations; it states the complex requirement with regards to the prior offering that must have been attended by the trainer of the given offering. Condition P3
involves the given offering and employees; it states that the trainer of the given offering has been employed for at least one year.
Note Constraint PODC7
implies that the very first offering given for every course must be given by a trainer that has been employed for at least one year. Evidently it takes at least a year to develop the material and reach the skill level necessary to offer a course.
This concludes the demonstration of how to formally specify a relational database design. You have seen how you can use set theory, in combination with logic, to produce a rock-solid database design specification. The method demonstrated also gives us a good and clear insight into the relevant constraints that play a role in the various layers of such a database design.
This section provides a summary of this chapter, formatted as a bulleted list. It provides a high-level overview of the material discussed in this chapter.
tup_OFFR
:
o(STATUS)='CONF' ⇒ o(TRAINER) ≠ -1
What is your response?
tab_MEMP
so that it includes a constraint that states that no manager can directly manage more than ten employees.tab_GRD
, there is currently no limit to the number of salary grades that are allowed to overlap with one another. Add a table constraint that ensures that a salary value falls within at most two salary grades.DBS1
using the longhand notation for tables.DB_U2
, what are the admissible RESULT
tables that can be combined with the following LIMIT
table?
{ { (POPULATION;'DP'), (SCORE;'A') },
{ (POPULATION;'NON-DP'), (SCORE;'F') } }.
PTIJ5
implies that no employee can manage more than one department. This renders the third table constraint of table universe tab_DEPT
useless: it can never be violated given PTIJ5
.
PTIJ5
such that if the employee manages two departments, then one of those must be the department in which the employee is employed.PODC6
is a disguised tuple-in-join predicate. Can you provide the alternative formal specification that follows the structure of a tuple-in-join predicate?PODC1
using quantifiers.1000
has a special status within the company. Take a look at the following database constraint with regards to this employee:
PODC8(OFFR,REG) :=
(∀o∈OFFR: (∃r∈REG: r↓{course,starts}=o↓{course,starts) ∧
r(stud)=1000)
⇒
( #{ r2 | r2∈REG ∧ r2↓{course,starts}=o↓{course,starts)} = 1
∧
¬(∃o2∈OFFR: o2↓{loc,starts}=o↓{loc,starts} ∧
o2(course)≠o(course))
)
)
This constraint states that whenever employee 1000
is attending an offering, then no other students are allowed to attend this offering, nor can there be any other offering (for a different course) starting at the same date and location as this offering. This constraint is actually a conjunction of two constraints, where one is a (disguised) table constraint and the other a (simpler) database constraint. Find out what these two constraints are by applying various rewrite rules introduced in Part 1 of this book.
db_empty
:
{ ( EMP; Ø ), ( SREP; Ø ), ( MEMP; Ø ), ( TERM; Ø ), ( DEPT; Ø ),
( GRD; Ø ), ( CRS; Ø ), ( OFFR; Ø ), ( REG; Ø ), ( HIST; Ø ) }
Is state db_empty
an element of database universe DB_UEX
?
BONUS
table structure. Here’s the external predicate of this table structure: “The employee with employee number EMPNO
has received a bonus of AMOUNT
dollars in year YEAR
.” The following constraints apply with regards to this new table structure:
Provide a characterization, tuple universe, and table universe for the BONUS
table structure and (if necessary) modify the DB_UEX
universe.
DB_UEX
universe.PTIJ15
, you can have it include the limitation imposed by predicate PTIJ13
; it then implies PTIJ13
, which means you can get rid of PTIJ13
. Do you see how to change PTIJ15
?18.118.149.19