images

In this chapter, we’ll give a demonstration of how you can formally specify a database design. Formalizing a database design specification has the advantage of avoiding any ambiguity in the documentation of not only the database structure but, even more importantly, of all involved data integrity constraints.

imagesNote Within the IT industry, the term business rules is often used to denote what this book refers to as data integrity constraints. However, because a clear definition of what exactly is meant by business rules is seldom given, we cannot be sure about this. In this book, we prefer not to use the term business rules, but instead use data integrity constraints. In this chapter, we’ll give a clear definition of the latter term.

We’ll give the formal specification of a database design by defining the data type of a database variable. This data type—essentially a set—holds all admissible database states for the database variable and is dubbed the database universe.

You’ll see that a database universe can be constructed in a phased (layered) manner, which along the way provides us with a clear classification schema for data integrity constraints.

First, you define what the vocabulary is. What are the things, and aspects of these things in the real world, that you want to deal with in your database? Here you specify a name for each table structure that is deemed necessary, and the names of the attributes that the table structure will have. We’ll introduce an example database design to demonstrate this. The vocabulary is formally defined in what is called a database skeleton. A good way to further explain the meaning of all attributes (and their correlation) is to provide the external predicate for each table structure; this is a natural language sentence describing the meaning and correlation of the involved attributes.

Given the database skeleton, we then define for each attribute the set of admissible attribute values. This is done by introducing a characterization for each table structure. You were introduced to the concept of a characterization in Chapter 4.

You’ll then use these characterizations as building blocks to construct the set of admissible tuples for each table. This is called a tuple universe, and includes the formal specification of tuple constraints.

Then, you’ll use the tuple universes to build the set of admissible tables for each table structure. This set is called a table universe, and can be considered the data type of a table variable. The definition of a table universe will include the formal specification of the relevant table constraints.

The last section of this chapter shows how you can bring together the table universes in the definition of the set of admissible database states, which was the goal set out for this chapter: to define a database universe. In this phase you formally specify the database (multi-table) constraints.

Because the example database universe presented in this chapter has ten table structures, we’ll introduce you to ten characterizations, ten tuple universes, and ten table universes. This, together with the explanatory material provided, makes this chapter a rather big one. However, the number of examples should provide you with a solid head start to applying the formal theory, and thereby enable you to start practicing this methodology in your job as a database professional. You can find a version of this example database design specification that includes the design’s bare essentials in Appendix A.

After the “Chapter Summary” section, a section with exercises focuses primarily on changing or adding constraint specifications in the various layers of the example database universe introduced in this chapter.

Documenting Databases and Constraints

Because you’re reading this book, you consider yourself a database professional. Therefore, it’s likely that the activity of specifying database designs is part of your job. You’ll probably agree that the process of designing a database roughly consists of two major tasks:

  1. Discovering the things in the real world for which you need to introduce a table structure in your database design. This is done by interviewing and communicating with the users and stakeholders of the information system that you’re trying to design.
  2. Discovering the data integrity constraints that will control the data that’s maintained in the table structures. These constraints add meaning to the table structures introduced in step one, and will ultimately make the database design a satisfactory fit for the reality that you’re modeling.

The application of the math introduced in Part 1 of this book is primarily geared to the second task; it enables you to formally specify the data integrity constraints. We’re convinced that whenever you design a database, you should spend the biggest part of time on designing the involved data integrity constraints. Accurately—that is, unambiguously—documenting these data integrity constraints can spell the difference between your success and failure.

Still, today documenting data integrity constraints is most widely done using natural language, which often produces a quick dive into ambiguity. If you use plain English to express data integrity constraints, you’ll inevitably hit the problem of how the English sentence maps, unambiguously, into the table structures. Different programmers (and users alike) will interpret such sentences differently, because they all try to convert these into something that will map into the database design. Programmers then code their perception of the constraint (not necessarily the specifier’s).

The sections that follow will demonstrate that the logic and set theory introduced in Part 1 lends itself excellently to capturing database designs with their integrity constraints in a formal manner. Formal specifications of data integrity constraints tell you exactly how they map into the table structures. You’ll not only avoid the ambiguity mentioned earlier, but moreover you’ll get a clear and expert view of the most important aspect of a database: all involved data integrity constraints.

imagesNote Some of you will be surprised, by the example that follows, of how much of the overall specification of an information system actually sits in the specification of the database design. A lot of the “business logic” involved in an information system can often be represented by data integrity constraints that map into the underlying table structures that support the information system.

The Layers Inside a Database Design

Having set the scene, we’ll now demonstrate how set theory and logic enable you to get a clear and professional view of a database design and its integrity constraints. The next two sections introduce you (informally) to a particular way of looking at the quintessence of a database design. This view is such that it will enable a layered set-theory specification of a database design.

Top-Down View of a Database

A database (state) at any given point in time is essentially a set of tables. Our database, or rather our database variable, holds the current database state. In the course of time, transactions occur that assign new database states to the database variable. We need to specify the set of all admissible database states for our database variable. This set is called the database universe, and in effect defines the data type for the database variable. Viewed top down, within the database universe for a given database design that involves say n table structures, you can observe the following:

  • Every database state is an admissible set of n tables (one per table structure), where
  • every table is an admissible set of tuples, where
  • every tuple is an admissible set of attribute-value pairs, where
  • every value is an admissible value for the given attribute.

Because all preceding layers are sets, you can define them all mathematically using set theory. Through logic (by adding embedded predicates) you define exactly what is meant by admissible in each layer; here the data integrity constraints enter the picture.

So how do you specify, in a formal way, this set called the database universe? This is done in a bottom-up approach using the same layers introduced earlier. First, you define what your vocabulary is: what are the things, and aspects of them in the real world, that you want to deal with in your database? In other words, what table structures do you need, and what attributes does each table structure have? This is formally defined in what is called a database skeleton.

For each attribute introduced in the database skeleton, you then define the set of admissible attribute values. You’ve already been introduced to this; in this phase all characterizations (one per table structure) are defined.

You then use the characterizations as building blocks to build (define) for each table structure the set of admissible tuples. This involves applying the generalized product operator (see Definition 4-7) and the introduction of tuple predicates. The set of admissible tuples is called a tuple universe.

You can then use the tuple universes to build for each table structure the set of admissible tables, which is called a table universe. You’ll see how this can be done in this chapter; it involves applying the powerset operator and introducing table predicates.

In the last phase you define the set of admissible database states—the database universe—using the previously defined table universes.

This methodology of formally defining the data type of a database variable was developed by the Dutch mathematician Bert De Brock together with Frans Remmen in the 1980s, and is an elegant method of accurately defining a database design, including all relevant data integrity constraints. The references De grondslagen van semantische databases (Academic Service, 1990, in Dutch) and Foundations of Semantic Databases (Prentice Hall, 1995) are books written by Bert De Brock in which he introduces this methodology.

Classification Schema for Constraints

In this bottom-up solid construction of a database universe, you explicitly only allow sets of admissible values at each of the levels described earlier. This means that at each level these sets must satisfy certain data integrity constraints. The constraints specify which sets are valid ones; they condition the contents of the sets. This leads straightforwardly to four classes of data integrity constraints:

imagesNote We’ll revisit this matter in Chapter 11 when the attribute value sets are implemented in an SQL database management system.

  • Attribute constraints: In fact, these are the attribute value sets that you specify in a characterization. You can argue whether the term “constraint” is appropriate here. A characterization simply specifies the attribute value set for every attribute (without further constraining the elements in it). However, the attribute value set does constrain the values allowed for the attribute.
  • Tuple constraints: These are the tuple predicates that you specify inside the definition of a tuple universe. The tuple predicates constrain combinations of values of different attributes within a tuple. Sometimes these constraints are referred to as inter-attribute constraints. You can specify them without referring to other tuples. For instance, here’s a constraint between attributes Job and Salary of an EMP (employee) table structure: “Employees with job President earn a monthly salary greater than 10000 dollars.”
  • Table constraints: These are table predicates that you specify inside the definition of a table universe. The table predicates constrain combinations of different tuples within the same table. Sometimes these constraints are referred to as inter-tuple constraints. You can specify them without referring to other tables. For instance: “No employee can earn a higher monthly salary than his/her manager” (here we assume the presence of a Manager attribute in the EMP table structure that references the employee’s manager).
  • Database constraints: These are database predicates that you specify inside the definition of a database universe. The database predicates constrain combinations of tables for different table structures. Sometimes these constraints are referred to as inter-table constraints. You can only specify them while referring to different table structures. For instance, there’s the omnipresent database constraint between the EMP and DEPT table structures: each employee must work for a known department.

These four classes of constraints accept or reject a given database state. They condition database states and are often referred to as static (or state) constraints; they can be checked within the context of a (static) database state. In actuality there is one more constraint class. This is the class of constraints that limit database state transitions (on grounds other than the static constraints). Predicates specifically conditioning database state transitions are referred to as dynamic (or state transition) constraints. We’ll cover these separately in Chapter 8.

Because the preceding classification scheme is driven by the scope of data that a constraint deals with, it has the advantage of being closely related to implementation issues of constraints. When you implement a database design in an SQL DBMS, you’ll be confronted with these issues, given the poor declarative support for data integrity constraints in these systems. This lack of support puts the burden upon you to develop often complex code that enforces the constraints. Chapter 11 will investigate these implementation challenges of data integrity constraints using the classification introduced here.

Specifying the Example Database Design

We’ll demonstrate the application of the theory presented in Part 1 of this book through an elaborate treatment of a database design that consists of ten table structures.

We comment up front that this database design merely serves as a vehicle to demonstrate the formal specification methodology; it is explicitly not our intention to discuss why the design is as it is. We acknowledge that some of the assumptions on which this design is based could be questionable. Also we mention up front that this design has two hacks, probably by some of you considered rather horrible. We’ll indicate these when they are introduced.

Figure 7-1 shows a diagram of the ten table structures (represented by boxes) and their mutual relationships (represented by arrows). Each of the arrows indicates a subset requirement predicate that is applicable between a pair of table structures.

images

Figure 7-1. Picture of example database

imagesNote The majority of these arrows represent what is often called many-to-one relationships and will eventually end up as foreign keys during the implementation phase in an SQL DBMS. However, this need not always be the case, as you will see. The exact meaning of each arrow will be given in the database universe specification where each arrow translates to a database constraint.

Our database holds employees (EMP) and departments (DEPT) of a company. Some of the arrows indicate the following:

  • Every employee works for a department.
  • Every department is managed by an employee.
  • Every employee is assigned to a salary grade (GRD).

Employee history (HIST) records are maintained for all salary and/or “works-for-department” changes; every history record describes a period during which one employee was assigned to a department with a specific salary.

We hold additional information for all sales representatives in a separate table structure (SREP). We hold additional information for employees who no longer work for the company (that is, they have been terminated or they resigned) in TERM. Note that we keep the EMP information for terminated employees. We also hold additional information for all managed employees (MEMP); that is, employees who have a manager assigned to them.

The database further holds information about courses (CRS), offerings (OFFR) of those courses, and registrations (REG) for those course offerings. Some more arrows show the following:

  • An offering must be taught by a trainer who works for the company.
  • An offering is for an existing course.
  • A registration records an employee as an attendee for a course offering.

You now have some idea of the information we’re maintaining in this database. In the next section, you’ll find the database skeleton. As mentioned before, it introduces the names of all attributes for every table structure. Together with the table structure names, they form the vocabulary that we have available in our database design.

Database Skeleton

The names of the things in the real world that we are representing in our database design, including the names of the attributes of interest, are introduced in what is called a database skeleton. We sometimes refer to this as the conceptual skeleton. As you saw in Chapter 5, a database skeleton is represented as a set-valued function. The domain of the skeleton function is the set of table structure names. For each name, this function yields the set of attribute names of that table structure; that is, the heading of that table structure.

Our database skeleton DB_S for the example database design is defined in Listing 7-1. Inside the specification of DB_S you see embedded comments (/* ... */) to clarify further the chosen abbreviations for the table structure and attribute names.

Listing 7-1. Database Skeleton Definition

DB_S := { (EMP;                   -- Employees
                { EMPNO          /* Employee number             */
                , ENAME          /* Employee name               */
                , JOB            /* Employee job                */
                , BORN           /* Date of birth               */
                , HIRED          /* Date hired                  */
                , SGRADE         /* Salary grade                */
                , MSAL           /* Monthly salary              */
                , USERNAME       /* Username                    */
                , DEPTNO   } )   /* Department number           */
       , (SREP;                  -- Sales Representatives
                { EMPNO          /* Employee number             */
                , TARGET         /* Sales target                */
                , COMM     } )   /* Commission                  */
       , (MEMP;                  -- Managed Employees
                { EMPNO          /* Employee number             */
                , MGR      } )   /* Manager: employee number    */
       , (TERM;                  -- Terminated Employees
                { EMPNO          /* Employee number             */
                , LEFT           /* Date of leave               */
                , COMMENTS } )   /* Termination comments        */
       , (DEPT;                  -- Departments
                { DEPTNO         /* Department number           */
                , DNAME          /* Department name             */
                , LOC            /* Location                    */
                , MGR      } )   /* Manager: employee number    */
       , (GRD;                   -- Salary Grades
                { GRADE          /* Grade code                  */
                , LLIMIT         /* Lower salary limit          */
                , ULIMIT         /* Upper salary limit          */
                , BONUS    } )   /* Yearly bonus                */
       , (CRS;                   -- Courses
                { CODE           /* Course code                 */
                , DESCR          /* Course description          */
                , CAT            /* Course category             */
                , DUR      } )   /* Duration of course in days  */
       , (OFFR;                  -- Course Offerings
                { COURSE         /* Code of course              */
                , STARTS         /* Begin date of this offering */
                , STATUS         /* Scheduled, confirmed, ...   */
                , MAXCAP         /* Max participants capacity   */
                , TRAINER        /* Trainer: employee number    */
                , LOC      } )   /* Location                    */
       , (REG;                   -- Course Registrations
                { STUD           /* Student: employee number    */
                , COURSE         /* Course code                 */

                , STARTS         /* Begin date course offering  */
                , EVAL     } )   /* Evaluation                  */
       , (HIST;                  -- Employee History Records
                { EMPNO          /* Employee number             */
                , UNTIL          /* History record end date     */
                , DEPTNO         /* Department number           */
                , MSAL     } ) } /* Monthly salary              */

Given the database skeleton, you can now write expressions such as DB_S(DEPT), which represents the set of attribute names of the DEPT table structure. The expression denotes the set {DEPTNO, DNAME, LOC, MGR}.

With this definition of the table headings, you’re now developing some more sense of what each table structure in our database design is all about—what it intends to represent. A way to clarify further the meaning of the table structures and their attributes is to provide the external predicates. An external predicate is an English sentence that involves all attributes of a table structure and supplies a statement regarding these attributes that explains their interconnected meaning. Following is the external predicate for the EMP table structure:

The employee with employee number EMPNO has name ENAME, job JOB, was born on BORN, is hired on HIRED, has a monthly salary of MSAL dollars within the SGRADE salary grade, is assigned to account USERNAME and works for the department with department number DEPTNO.

It is called external because a database management system cannot deal with this English sentence. It is meant for the (external) users of the system, and supplies an interpretation of the chosen names for the attributes. It is called a predicate because you can view this English sentence as being parameterized, where the parameters are the embedded attribute names. You can instantiate the external predicate using the tuples in the current EMP table. You do this by replacing every occurrence of an attribute name inside the sentence with the corresponding attribute value within a given tuple. The new sentence formed this way can be viewed as a proposition that can either yield TRUE or FALSE. Sentences generated in this way by the external predicate are statements about the real world represented by the table. By convention, the propositions that are constructed in this way are assumed to be TRUE. This is precisely how external predicates further clarify the meaning of your database design.

Table 7-1 lists the external predicates for all table structures introduced in the skeleton.

Table 7-1. External Predicates

Table External Predicate
EMP The employee with employee number EMPNO has name ENAME, job JOB, was born on BORN, is hired on HIRED, has a monthly salary of MSAL dollars within the SGRADE salary grade, is assigned to account USERNAME, and works for the department with department number DEPTNO.
SREP The sales representative with employee number EMPNO has an annual sales target of TARGET dollars and a yearly commission of COMM dollars.
MEMP The employee with employee number EMPNO is managed by the employee with employee number MGR.
TERM The employee with employee number EMPNO has resigned or was fired on date LEFT due to reason COMMENTS.
DEPT The department with department number DEPTNO, has name DNAME, is located at LOC, and is managed by the employee with employee number MGR.
GRD The salary grade with ID GRADE has a lower monthly salary limit of LLIMIT dollars, an upper monthly salary limit of ULIMIT dollars, and a maximum yearly bonus of BONUS dollars.
CRS The course with code CODE has description DESCR, falls in course category CAT, and has a duration of DUR days.
OFFR The course offering for the course with code COURSE that starts on STARTS, has status STATUS, has a maximum capacity of MAXCAP attendees, is offered at location LOC, and (unless TRAINER equals -1) the offering has the employee with employee number TRAINER assigned as the trainer.
REG The employee whose employee number is STUD has registered for a course with code COURSE that starts on STARTS, and (unless EVAL equals -1) has rated the course with an evaluation score of EVAL.
HIST At date UNTIL, for the employee whose employee number is EMPNO, either the department or the monthly salary (or both) have changed. Prior to date UNTIL, the department for that employee was DEPTNO and the monthly salary was MSAL.

imagesNote Have you spotted the two hacks? Apparently there are two sorts of offerings: offerings with a trainer assigned and offerings without one assigned. A similar remark can be made about registrations; some of them include an evaluation score for the course offering, and some of them don’t. In a properly designed database, you should have decomposed the offering and registration table structures into two table structures each.

These external predicates give you an informal head start with regards to the meaning of all involved table structures and their attributes that were introduced by the database skeleton. The exact meaning of this example database design will become clear as we progress through all formal phases of a database universe definition in the sections that follow.

The next section will supply a characterization for each table structure introduced in the skeleton.

Characterizations

As you saw in Chapter 4, a characterization defines the attribute value sets for the attributes of a given table structure. For a given table structure, the characterization is a set-valued function whose domain is the set of attributes of that table structure. For each attribute, the characterization yields the attribute value set for that attribute. The characterizations form the base on which the next section will build the tuple universes. You’ll then notice that the way these characterizations are defined here is very convenient. Take a look at Listing 7-2. It defines the characterization for the EMP table.

imagesNote A few notes:
        In defining the attribute value sets for the EMP table, we are using the shorthand names for sets that were introduced in Table 2-4.
        We use chr_<table structure name> as a naming convention for the characterization of a table structure.
        In the definition of chr_EMP (and in various other places) you’ll see a function called upper. This function accepts a case-sensitive string and returns the uppercase version of that string.

Listing 7-2. Characterization chr_EMP

chr_EMP :=
{ ( EMPNO;    [1000..9999]                       )
, ( ENAME;    varchar(9)                         )
, ( JOB;      /* Five JOB values allowed */
              {'PRESIDENT','MANAGER','SALESREP',
               'TRAINER','ADMIN'}                )
, ( BORN;     date                               )
, ( HIRED;    date                               )
, ( SGRADE;   [1..99]                            )
, ( MSAL;     { n | n∊number(7,2) ∧ n > 0 }     )
, ( USERNAME; /* Usernames are always in uppercase */
              { s | s∊varchar(15) ∧
                    upper(USERNAME) = USERNAME } )
, ( DEPTNO;   [1..99]                            )
}

For every attribute of table structure EMP, function chr_EMP yields the attribute value set for that attribute. You can now write expressions such as chr_EMP(EMPNO), which represents the attribute value set of the EMPNO attribute of the EMP table structure. The expression denotes set [1000..9999].

The definition of characterization chr_EMP tells us the following:

  • EMPNO values are positive integers within the range 1000 to 9999.
  • ENAME values are variable length strings with at most nine characters.
  • JOB values are restricted to the following five values: 'PRESIDENT', 'MANAGER', 'SALESREP', 'TRAINER', 'ADMIN'.
  • BORN and HIRED values are date values.
  • SGRADE values are positive integers in the range 1 to 99.
  • MSAL values are positive numbers with precision seven and scale two.
  • USERNAME values are uppercase variable length strings with at most 15 characters.
  • DEPTNO values are positive integers in the range 1 to 99.

In the remainder of our database design definition, four sets will occur quite frequently: employee numbers, department numbers, salary-related amounts, and course codes. We define shorthand names (symbols for ease of reference in the text) for them here, and use these in the characterization definitions that follow.

EMPNO_TYP   := { n | n∊number(4,0) ∧ n > 999      }
DEPTNO_TYP  := { n | n∊number(2,0) ∧ n > 0        }
SALARY_TYP  := { n | n∊number(7,2) ∧ n > 0        }
CRSCODE_TYP := { s | s∊varchar(6)  ∧ s = upper(s) }

Listings 7-3 through 7-11 introduce the characterization for the remaining table structures. You might want to revisit Table 7-1 (the external predicates) while going over these characterizations. Embedded comments clarify attribute constraints where deemed necessary.

Listing 7-3. Characterization chr_SREP

chr_SREP :=
{ ( EMPNO;  EMPNO_TYP      )
            /* Targets for sales reps are five digit numbers */
, ( TARGET; [10000..99999] )
, ( COMM;   SALARY_TYP     )
}

Listing 7-4. Characterization chr_MEMP

chr_MEMP :=
{ ( EMPNO;  EMPNO_TYP )
, ( MGR;    EMPNO_TYP )
}

Listing 7-5. Characterization chr_TERM

chr_TERM :=
{ ( EMPNO;   EMPNO_TYP    )
, ( LEFT;    date         )
, ( COMMENTS; varchar(60) )
}

Listing 7-6. Characterization chr_DEPT

chr_DEPT :=
{ ( DEPTNO; DEPTNO_TYP                                    )
, ( DNAME;  { s | s∊varchar(12) ∧ upper(DNAME) = DNAME } )
, ( LOC;    { s | s∊varchar(14) ∧ upper(LOC) = LOC }      )
, ( MGR;      EMPNO_TYP                                   )
}

Listing 7-7. Characterization chr_GRD

chr_GRD :=
{ ( GRADE;  { n | n∊number(2,0) ∧ n > 0 } )
, ( LLIMIT; SALARY_TYP                    )
, ( ULIMIT; SALARY_TYP                    )
, ( BONUS;  SALARY_TYP                    )
}

Listing 7-8. Characterization chr_CRS

chr_CRS :=
{ ( CODE;  CRSCODE_TYP         )
, ( DESCR; varchar(40)         )
           /* Course category values: Design, Generate, Build */
, ( CAT;   {'DSG','GEN','BLD'} )
           /* Course duration must be between 1 and 15 days */
, ( DUR;   [1..15]             )
}

Listing 7-9. Characterization chr_OFFR

chr_OFFR :=
{ ( COURSE;  CRSCODE_TYP            )
, ( STARTS;  date                   )
             /* Three STATUS values allowed: Scheduled, Confirmed, Canceled */
, ( STATUS;  {'SCHD','CONF','CANC'} )
             /* Maximum course offering capacity; minimum = 6 */
, ( MAXCAP;  [6..100]               )
             /* TRAINER = -1 means "no trainer assigned" */
, ( TRAINER; EMPNO_TYP ∪ { -1 }    )
, ( LOC;     varchar(14)             )
}

Listing 7-10. Characterization chr_REG

chr_REG :=
{ ( STUD; EMPNO_TYP     )
, ( COURSE; CRSCODE_TYP )
, ( STARTS; date        )
  /* -1: too early to evaluate (course is in the future) */
  /* 0: not evaluated by attendee */
  /* 1-5: regular evaluation values (from 1=bad to 5=excellent) */
, ( EVAL;    [-1..5]    )
}

Listing 7-11. Characterization chr_HIST

chr_HIST :=
{ ( EMPNO; EMPNO_TYP   )
, ( UNTIL; date        )
, ( DEPTNO; DEPTNO_TYP )
, ( MSAL;   SALARY_TYP )
}

Note that in Listing 7-9 the attribute value set for attribute TRAINER includes a special value -1 next to valid employee numbers. This value represents the fact that no trainer has been assigned yet. In our formal database design specification method, there is no such thing as a NULL, which is a “value” commonly (mis)used by SQL database management systems to indicate a missing value. There are no missing values inside tuples; they always have a value attached to every attribute. Characterizations specify the attribute value sets from which these values can be chosen. So, to represent a “missing trainer” value, you must explicitly include a value for this fact inside the corresponding attribute value set. Something similar is specifiedin Listing 7-10 in the attribute value set for the EVAL attribute.

imagesNote Appendix F will explicitly deal with the phenomenon of NULLs. Chapter 11 will revisit these -1 values when we sort out the database design implementation issues and provide guidelines.

The specification of our database design started out with a skeleton definition and the external predicates for the table structures introduced by the skeleton. In this section you were introduced to the characterizations of the example database design. Through the attribute value sets, you are steadily gaining more insight into the meaning of this database design. The following section will advance this insight to the next layer: the tuple universes.

Tuple Universes

A tuple universe is a (non-empty) set of tuples. It is a very special set of tuples; this set is meant to hold only tuples that are admissible for a given table structure. You know by now that tuples are represented as functions. For instance, here is an example function tdept1 that represents a possible tuple for the DEPT table structure:

tdept1 := {(DEPTNO;10), (DNAME;'ACCOUNTING'), (LOC;'DALLAS'), (MGR;1240)}

As you can see, the domain of tdept1 represents the set of attributes for table structure DEPT as introduced by database skeleton DB_S.

dom(tdept1) = {DEPTNO, DNAME, LOC, MGR} = DB_S(DEPT)

And, for every attribute, tdept1 yields a value from the corresponding attribute value set, as introduced by the characterization for the DEPT table structure:

  • tdept1(DEPTNO) = 10, which is an element of chr_DEPT(DEPTNO)
  • tdept1(DNAME) = 'ACCOUNTING', which is an element of chr_DEPT(DNAME)
  • tdept1(LOCATION) = 'DALLAS', which is an element of chr_DEPT(LOCATION)
  • tdept1(MGR) = 1240, which is an element of chr_DEPT(MGR)

Here’s another possible tuple for the DEPT table structure:

tdept2 := {(DEPTNO;20), (DNAME;'SALES'), (LOC;'HOUSTON'), (MGR;1755)}

Now consider the set {tdept1, tdept2}. This is a set that holds two tuples. Theoretically it could represent the tuple universe for the DEPT table structure. However, it is a rather small tuple universe; it is very unlikely that it represents the tuple universe for the DEPT table structure. The tuple universe for a given table structure should hold every tuple that we allow (admit) for the table structure.

imagesNote Tuples tdept1 and tdept2 are functions that share the same domain. This is a requirement for a tuple universe; all tuples in the tuple universe share the same domain, which in turn is equal to the heading of the given table structure.

You have already seen how you can generate a set that holds every possible tuple for a given table structure using the characterization of that table structure (see the section “Table Construction” in Chapter 5). If you apply the generalized product to a characterization, you’ll end up with a set of tuples. This set is not just any set of tuples, but it is precisely the set of all possible tuples based on the attribute value sets that the characterization defines.

Let us illustrate this once more with a small example. Suppose you’re designing a table structure called RESULT; it holds average scores for courses followed by students that belong to a certain population. Here’s the external predicate for RESULT: “The rounded average score scored by students of population POPULATION for course COURSE is AVG_SCORE.” Listing 7-12 defines the characterization chr_RESULT for this table structure.

Listing 7-12. Characterization chr_RESULT

chr_RESULT :=
{ ( POPULATION; {'DP','NON-DP'}           )
   /* DP = Database Professionals, NON-DP = Non Database Professionals */
, ( COURSE;     {'set theory','logic'}    )
, ( AVG_SCORE;  {'A','B','C','D','E','F'} )
}

The three attribute value sets represent the attribute constraints for the RESULT table structure. If you apply the generalized product Π to chr_RESULT, you get the following set of possible tuples for the RESULT table structure:

Π(chr_RESULT) =
  { { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'A') }
  , { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'B') }
  , { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'C') }
  , { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'D') }
  , { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'E') }
  , { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'F') }
  , { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'A') }
  , { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'B') }
  , { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'C') }
  , { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'D') }
  , { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'E') }
  , { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'F') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'A') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'B') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'C') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'D') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'E') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'F') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'A') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'B') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'C') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'D') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'E') }
  , { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'F') }
  }

In this set of 24 tuples, the previously defined attribute constraints will hold. However, no restrictions exist in this set with regards to combinations of attribute values of different attributes inside a tuple. By specifying inter-attribute—or rather, tuple constraints—you can restrict the set of possible tuples to the set of admissible tuples for the given table.

Suppose that you do not allow average scores D, E, and F for database professionals, nor average scores A and B for non-database professionals (regardless of the course). You can specify this by the following definition of tuple universe tup_RESULT; it formally specifies two tuple predicates:

tup_RESULT :=
   { r | r∊Π(chr_RESULT) ∧
     /* ============================ */
     /* Tuple constraints for RESULT */
     /* ============================ */
     /* Database professionals never score an average of D, E or F */
     r(POPULATION)='DP' ⇒ r(AVG_SCORE)∉{'D','E','F'} ∧
     /* Non database professionals never score an average of A or B */
     r(POPULATION)='NON-DP' ⇒ r(AVG_SCORE)∉{'A','B'}
   }

The tuple predicates introduced by the definition of a tuple universe are referred to as tuple constraints. You can also specify set tup_RESULT in the enumerative way.

imagesNote The original set of 24 possible tuples has now been reduced to a set of 14 admissible tuples. Ten tuples did not satisfy the tuple constraints that are specified in tup_RESULT.

{ { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'A') }
, { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'B') }
, { (POPULATION; 'DP'),     (COURSE; 'set theory'), (AVG_SCORE; 'C') }
, { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'A') }
, { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'B') }
, { (POPULATION; 'DP'),     (COURSE; 'logic'),      (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'D') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'E') }
, { (POPULATION; 'NON-DP'), (COURSE; 'set theory'), (AVG_SCORE; 'F') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'C') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'D') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'E') }
, { (POPULATION; 'NON-DP'), (COURSE; 'logic'),      (AVG_SCORE; 'F') }
}

Note that the former specification of tup_RESULT, using the predicative method to specify a set, is highly preferred over the latter enumerative specification, because it explicitly shows us what the tuple constraints are (and it is a shorter definition too; much shorter in general).

Now let’s continue with our example database design. Take a look at Listing 7-13, which defines tuple universe tup_EMP for the EMP table structure of the example database design.

Listing 7-13. Tuple Universe tup_EMP

tup_EMP :=
{ e | e∊Π(chr_EMP) ∧
      /* ========================= */
      /* Tuple constraints for EMP */
      /* ========================= */
      /* We hire adult employees only */
      e(BORN) + 18 ≤ e(HIRED) ∧
      /* Presidents earn more than 120K */
      e(JOB) = 'PRESIDENT' ⇒ 12*e(MSAL) > 120000 ∧
      /* Administrators earn less than 5K */
      e(JOB) = 'ADMIN' ⇒ e(MSAL) < 5000
}

imagesNote In this definition, we assume that addition has been defined for values of type date (see Table 2-4), enabling us to add years to such a value.

Are you starting to see how this works? Tuple universe tup_EMP is a subset of Π(chr_EMP). All tuples that do not satisfy the tuple constraints (three in total) specified in the definition of tup_EMP are left out. You can use any of the logical connectives introduced in Table 1-2 of Chapter 1 in conjunction with valid attribute expressions to formally specify tuple constraints.

Note that all ambiguity is ruled out by these formal specifications:

  • By “adult,” the age of 18 or older is meant. The ≤ symbol implies that the day someone turns 18 he or she can be hired.
  • The “K” in 120K and 5K (in the comments) represents the integer 1000 and not 1024. The salaries mentioned (informally by the users and formally inside the specifications) are actually the monthly salary in the case of a CLERK and the yearly salary in the case of a PRESIDENT. This could be a habit in the real world, and it might be wise to reflect this in the formal specification too. Of course, you can also specify the predicate involving the PRESIDENT this way: e(JOB) = 'PRESIDENT'e(MSAL) > 10000.

Listings 7-14 through 7-22 introduce the tuple universes for the other table structures in our database design. You’ll find embedded informal comments to clarify the tuple constraints. Note that tuple constraints are only introduced for table structures GRD, CRS, and OFFR; the other table structures happen to have no tuple constraints.

Listing 7-14. Tuple Universe tup_SREP

tup_SREP :=
{ s | s∊Π(chr_SREP) /* No tuple constraints for SREP */ }

Listing 7-15. Tuple Universe tup_MEMP

tup_MEMP :=
{ m | m∊Π(chr_MEMP) }

Listing 7-16. Tuple Universe tup_TERM

tup_TERM :=
{ t | t∊Π(chr_TERM) }

Listing 7-17. Tuple Universe tup_DEPT

tup_DEPT :=
{ d | d∊Π(chr_DEPT) }

Listing 7-18. Tuple Universe tup_GRD

tup_GRD :=
{ g | g∊Π(chr_GRD) ∧
    /* Salary grades have a "bandwidth" of at least 500 dollars */
    g(LLIMIT) ≤ g(ULIMIT) - 500 ∧

    /* Bonus must be less than lower limit */
    g(BONUS) < g(LLIMIT)
}

Listing 7-19. Tuple Universe tup_CRS

tup_CRS :=
{ c | c∊Π(chr_CRS) ∧
      /* Build courses never take more than 5 days */
      c(CAT) = 'BLD' ⇒ c(DUR) ≤ 5
}

Listing 7-20. Tuple Universe tup_OFFR

tup_OFFR :=
{ o | o∊Π(chr_OFFR) ∧
      /* Unassigned TRAINER allowed only for certain STATUS values */
      o(TRAINER) = -1 ⇒ o(STATUS)∊{'CANC','SCHD'}
}

Listing 7-21. Tuple Universe tup_REG

tup_REG :=
{ r | r∊Π(chr_REG) }

Listing 7-22. Tuple Universe tup_HIST

tup_HIST :=
{ h | h∊Π(chr_HIST) }

Listing 7-20 defines when the special -1 value is allowed for the TRAINER attribute; confirmed offerings (STATUS = 'CONF') must have an employee number assigned as the trainer.

This concludes the tuple universe layer of our example database design. Through the specification of the tuple constraints, you’ve gained more insight into the meaning of this database design. The next section continues the construction of the database design’s specification, by advancing to the table universe layer. As you’ll see, this involves the application of more set-theory and logic concepts that were introduced in Part 1 of this book.

Table Universes

You can use a tuple universe to build a set of admissible tables (we’ll demonstrate this shortly). Such a set is called a table universe. Every element in a table universe is an admissible table for the corresponding table structure.

A tuple universe is a set of tuples and can be considered a table too. It is a rather large set of tuples, because it has every tuple that can be built using the characterization and taking into consideration the tuple constraints. We’ve mentioned before that a tuple universe can be considered the largest table for a given table structure.

Every subset of a tuple universe is a table too. In fact, if you would construct a set that holds every subset of a tuple universe, then this set would contain lots of tables; every possible table for a given table structure would be in this set. Do you remember, from Part 1 of this book, how to construct the set of all subsets of a given set? The powerset operator does just that. The powerset of a tuple universe can be considered the set of all possible tables for a given table structure.

imagesNote You might want to revisit the section “Powersets and Partitions” in Chapter 2, and refresh your memory regarding the powerset operator.

In a similar way as tuple universes are defined, you can restrict the powerset of a tuple universe to obtain the set of admissible tables. You can add table predicates (constraining combinations of tuples) to discard possible tables that were generated by the powerset operator, but that do not reflect a valid representation of the real world. Table predicates that are used to restrict the powerset of a tuple universe are referred to as table constraints.

Let’s illustrate all this using the RESULT table structure that was introduced in the previous section. The powerset of tuple universe tup_RESULT results in a set that holds every possible RESULT table. There are lots of tables in this set. To be precise, because the cardinality of tup_RESULT is 14, there are exactly 16384 (2 to the 14th power) possible tables. These are far too many to list in the enumerative way. Figure 7-2 displays just one of these tables (an arbitrarily chosen subset of tup_RESULT). Let’s name this table R1.

images

Figure 7-2. A possible table for a RESULT named R1

Table R1 is a subset of tup_RESULT. It holds 11 distinct tuples. Because these tuples originate from the tuple universe, all of them are admissible tuples; they satisfy the tuple constraints, and every attribute holds an admissible value.

Now assume that the following (informally specified) data integrity constraints play a role in a table for the RESULT table structure:

  • The combination of attributes POPULATION and COURSE is uniquely identifying in a RESULT table (constraint P1).
  • A RESULT table is empty or it holds exactly four tuples: one tuple for every combination of POPULATION and COURSE (constraint P2).
  • The average score (AVG_SCORE) of the logic course is always higher than the average score of the set theory course; score A is the highest, F the lowest (constraint P3).
  • Non-database professionals always have a lower average score than database professionals (constraint P4).

imagesNote Some of these integrity constraints are rather contrived. Their sole purpose in this example is to bring down the number of tables in the table universe of the RESULT table structure to such an extent that it becomes feasible to enumerate all admissible tables.

In Listing 7-23 you can find these four constraints formally specified as table predicates, using the names P1 through P4 introduced earlier. To be able to compare average scores in these specifications, we introduce a function f, which is defined as follows:

f := { ('A';6), ('B';5), ('C';4), ('D';3), ('E';2), ('F';1) }

This enables us to compare, for instance, scores B and E. Because f(B)=5 and f(E)=2 (and 5>2), we can say that B is a higher score than E.

Listing 7-23. Table Predicates P1, P2, P3, and P4

P1(T) := ( ∀r1,r2∊T: r1↓{POPULATION,COURSE} = r2↓{POPULATION,COURSE} ⇒ r1 = r2 )
P2(T) := ( #T = 0 ∨ #T = 4 )
P3(T) := ( ∀r1,r2∊T: ( r1(POPULATION) = r2(POPULATION) ∧
                      r1(COURSE) = 'logic' ∧ r2(COURSE) = 'set theory' )
                    ⇒ f(r1(AVG_SCORE)) > f(r2(AVG_SCORE)) )
P4(T) := ¬( ∃r1,r2∊T: r1(POPULATION) = 'NON-DP' ∧ r2(POPULATION) = 'DP' ∧
                     f(r1(AVG_SCORE)) ≥ f(r2(AVG_SCORE)) )

Table predicate P1 is one of the common types of data integrity predicates that were introduced in the section “Unique Identification Predicate” in Chapter 6.

Table predicate P2 is rather simple; the cardinality of the table should be either zero or four. Here is an alternative way to specify this: #T∊{0,4}.

As you can see, table predicate P3 specifies that the average score for the logic course should always be higher than the average score for the set theory course within a population. The first conjunct in the universal quantification—r1(POPULATION) = r2(POPULATION)—specifies this. A user can take for granted that the average scores for logic are always higher than those for set theory, within a given population, but fail to mention this explicitly (as was done in the preceding informal specification) when conveying the requirement informally.

Table predicate P4 unambiguously specifies that the “lower average score” mentioned in the informal specification is meant to be irrespective of the course; there is no conjunct r1(COURSE) = r2(COURSE) inside the existential quantification.

You can instantiate predicates P1 through P4 using table R1 that was introduced in Figure 7-2. Check for yourself that table R1 violates predicates P1, P2, and P3, and that it satisfies predicate P4.

P1(R1) = false
P2(R1) = false
P3(R1) = false
P4(R1) = true

With these formal table predicate specifications you can now define the table universe for the RESULT table structure. Take a look at Listing 7-24, which formally specifies table universe tab_RESULT using tuple universe tup_RESULT and table predicates P1, P2, P3, and P4.

Listing 7-24. Specification of Table Universe tab_RESULT

tab_RESULT :=
{ R | R∊℘(tup_RESULT) ∧ P1(R) ∧ P2(R) ∧ P3(R) ∧ P4(R)
}

tab_RESULT holds every subset of tup_RESULT that satisfies all four table predicates; obviously table R1 is not an element of tab_RESULT. The table predicates restrict the powerset of a tuple universe and are therefore referred to as table constraints.

imagesNote If a unique identification predicate constitutes a table constraint (as does P1 in the preceding case), then the set of uniquely identifying attributes is commonly referred to as a key for the given table structure. In this case {POPULATION,COURSE} is a key for the RESULT table structure.

Table constraints P1, P2, P3, and P4 are contrived such that they significantly bring down the total number of tables from the original 16384 possible tables generated by the powerset; in fact, only 13 admissible tables remain in this table universe. Listing 7-25 displays an enumerative specification of table universe tab_RESULT.

Listing 7-25. Enumerative Specification of Table Universe tab_RESULT

tab_RESULT :=
{ ∅
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'B') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'D') }

  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'C') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'B') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'C') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'B') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'C') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'B') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'B') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'B') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'E') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'C') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'C') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'C') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'A') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'E') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'C') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'B') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'E') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'C') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'B') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'D') } }
, { { (POPULATION;'DP'),     (COURSE;'set theory'), (AVG_SCORE;'C') }
  , { (POPULATION;'DP'),     (COURSE;'logic'),      (AVG_SCORE;'B') }
  , { (POPULATION;'NON-DP'), (COURSE;'set theory'), (AVG_SCORE;'F') }
  , { (POPULATION;'NON-DP'), (COURSE;'logic'),      (AVG_SCORE;'E') } }
}

This set has 13 elements, each of which is a table. The first one represents the empty table; the other 12 all are sets of 4 tuples. You might want to validate for yourself that all these tables satisfy all four table constraints specified and that there are no other tables (not listed in Listing 7-25) that also satisfy all four table constraints.

In case you have some trouble reading the nested set of sets, Figure 7-3 might clarify what is going on. It shows another way to specify table universe tab_RESULT using the shorthand notation for tables.

images

Figure 7-3. Enumerative specification of tab_RESULT using table shorthand notation

imagesNote In general, table universes hold far too many elements to enumerate them.

Let’s now continue with the specification of our example database design. Take a look at Listing 7-26, which shows the definition of table universe tab_EMP for the EMP table structure. This definition formally specifies four table constraints for EMP. Some informal clarification is embedded as comments.

Listing 7-26. Table Universe tab_EMP

tab_EMP :=
{ E | E∊ ℘(tup_EMP) ∧
      /* EMPNO uniquely identifies an employee tuple */
      ( ∀e1,e2∊E: e1(EMPNO) = e2(EMPNO) ⇒ e1 = e2 )
      ∧
      /* USERNAME uniquely identifies an employee tuple */
      ( ∀e1,e2∊E: e1(USERNAME) = e2(USERNAME) ⇒ e1 = e2 )
      ∧
      /* At most one president allowed */
      #{ e | e∊E ∧ e(JOB) = 'PRESIDENT' } ≤ 1
      /* A department that employs the president or a manager */
      /* should also employ at least one administrator */
      ( ∀d∊E⇓{DEPTNO}:
        ( ∃e2∊E: e2(DEPTNO) = d(DEPTNO) ∧ e2(JOB) ∊ {'PRESIDENT','MANAGER'} )
          ⇒
        ( ∃e3∊E: e3(DEPTNO) = d(DEPTNO) ∧ e3(JOB) = 'ADMIN' )
      )
}

The first two constraints are unique identification predicates; the EMPNO attribute is uniquely identifying in an EMP table, as is the USERNAME attribute.

The specification of the third constraint constructs the subset of tuples that have the value PRESIDENT for the JOB attribute and restricts the cardinality of this set to 1 at most. This reflects the real-world requirement that there cannot be more than one president.

The last table constraint states that if there exists a PRESIDENT or a MANAGER in a given department, then there exists an ADMIN in the same department. The outer universal quantification supplies all departments for which to perform this check. Note that the informal embedded specification of this constraint might imply that this constraint is a multi-table constraint involving not just the EMP table structure but also the DEPT table structure. Take a look at the following database predicate named P that accepts two parameters: a department table (parameter D) and an employee table (parameter E).

P(D,E) :=
   ( ∀d∊D:
       ( ∃e2∊E: e2(DEPTNO) = d(DEPTNO) ∧ e2(JOB) ∊ {'PRESIDENT','MANAGER'} )
       ⇒
       ( ∃e2∊E: e3(DEPTNO) = d(DEPTNO) ∧ e3(JOB) = 'ADMIN' )
   )

This predicate closely follows the informal specification: it quantifies over all departments and states that if the department employs a PRESIDENT or a MANAGER, then the department should employ an ADMIN. Note that only the DEPTNO attribute of parameter D is used. The corresponding predicate inside the definition of tab_EMP quantifies over all available DEPTNO values inside the EMP table and then states the same. Because a DEPTNO attribute is available in the EMP table structure (and, as we shall see later, it always identifies a valid department), this constraint can be specified as a table constraint.

imagesCaution It is not uncommon that constraints that are, in fact, table constraints are specified as database (multi-table) constraints. This can also happen one phase earlier: tuple constraints that are specified as table constraints. This refers back to the comments we made in Chapter 6 when we mentioned terms such as “tuple predicate in disguise.” In the database design specification methodology introduced in this book, you should always try to specify constraints at the earliest possible phase: attribute value sets first, then tuple constraints, then table constraints, and finally database constraints. Not only will this result in a clear view of your database design, but also will it aid in your effort to implement it (covered in Chapter 11).

Here is an alternative (equivalent) formal specification for the third table constraint of EMP that might be easier to understand, and clearly shows that there is no need to involve the DEPT table structure:

( ∀e2∊E: e2(JOB) ∊ {'PRESIDENT','MANAGER'}
           ⇒
           ( ∃e3∊E: e3(DEPTNO) = e2(DEPTNO) ∧ e3(JOB) = 'ADMIN' )
)

This predicate states that for all employee tuples that represent a PRESIDENT or a MANAGER, there should exist an employee tuple that represents an ADMIN working in the same department. You probably intuitively feel that this formal specification is equivalent, and therefore represents the same constraint. In fact, it can formally be proven that it is equivalent to the formal specification of this constraint inside tab_EMP. However, this requires the development of more rewrite rules regarding quantifications, which is beyond the scope of this book.

Listings 7-27, 7-28, and 7-29 define the table universes for the SREP, MEMP, and TERM table structures. As you can see, they all have a unique identification constraint and no other table constraints.

Listing 7-27. Table Universe tab_SREP

tab_SREP :=
{ S | S∊ ℘(tup_SREP) ∧
      /* EMPNO uniquely identifies a tuple */
      ( ∀s1,s2∊S: s1(EMPNO) = s2(EMPNO) ⇒ s1 = s2 )
}

Listing 7-28. Table Universe tab_MEMP

tab_MEMP :=
{ M | M∊ ℘(tup_MEMP) ∧
      /* EMPNO uniquely identifies a tuple */
      ( ∀m1,m2∊M: m1(EMPNO) = m2(EMPNO) ⇒ m1 = m2 )
}

Listing 7-29. Table Universe tab_TERM

tab_TERM :=
{ T | T ∈ ℘(tup_TERM) ∧
      /* EMPNO uniquely identifies a tuple */
      ( ∀t1,t2∈ T: t1(EMPNO) = t2(EMPNO) ⇒ t1 = t2 )
}

The table universe for DEPT (shown in Listing 7-30) introduces two unique identification constraints. Next to {DEPTNO}, the set of attributes {DNAME,LOC} is also uniquely identifying in a DEPT table.

Listing 7-30. Table Universe tab_DEPT

tab_DEPT :=
{ D | D∈ ℘(tab_DEPT) ∧
      /* Department number uniquely identifies a tuple */
      ( ∀d1,d2∈D: d1(DEPTNO) = d2(DEPTNO) ⇒ d1 = d2 )
      ∧
      /* Department name and location uniquely identify a tuple */
      ( ∀d1,d2∈D:
        d1↓{DNAME,LOC} = d2↓{DNAME,LOC} ⇒ d1 = d2 )
      ∧
      /* You cannot manage more than two departments */
      ( ∀m∈D⇓{MGR}: #{ d | d∈D ∧ d(MGR) = m(MGR) } ≤ 2 )
}

The second unique identification constraint states that there cannot be two departments that have the same name and that are located in the same location; department names are unique within a location. The third table constraint is a formal representation of the constraint that no employee can manage more than two departments. It states that for every department manager, the cardinality of the set of departments managed by him/her is less than or equal to two.

Listing 7-31 shows the table universe definition for the GRD table structure.

Listing 7-31. Table Universe tab_GRD

tab_GRD :=
{ G | G∈ ℘(tup_GRD) ∧
      /* Salary grade code uniquely identifies a tuple */
      ( ∀g1,g2∈G: g1(GRADE) = g2(GRADE) ⇒ g1 = g2 )
      ∧
      /* Salary grade lower limit uniquely identifies a tuple */
      ( ∀g1,g2∈G: g1(LLIMIT) = g2(LLIMIT) ⇒ g1 = g2 )
      ∧
      /* Salary grade upper limit uniquely identifies a tuple */
      ( ∀g1,g2∈G: g1(ULIMIT) = g2(ULIMIT) ⇒ g1 = g2 )
      ∧
      /* A salary grade overlaps with at most one (lower) grade */

      ( ∀g1∈G:
        ( ∃g2∈G:  g2(LLIMIT) < g1(LLIMIT) )
          ⇒
        #{ g3 | g3∈G ∧ g3(LLIMIT) < g1(LLIMIT) ∧
                        g3(ULIMIT) ≥ g1(LLIMIT) ∧
                        g3(ULIMIT) < g1(ULIMIT) } = 1
    )
}

As you can see, this table universe specifies three unique identification constraints for GRD. The fourth table constraint specifies that there cannot be a salary gap between the ULIMIT salary and the LLIMIT salary of two consecutive salary grades. This is done by requiring that for every tuple (say g1) for which there exists a tuple that corresponds to a lower salary grade (say g2), there should exist a tuple (say g3) that corresponds to a lower salary grade and that overlaps with the salary range of tuple g1. In fact, there should exist exactly one such g3, thereby covering every salary by at most two salary grades. The formal specification states exactly what is meant and leaves no room for ambiguity.

Listings 7-32 and 7-33 define the table universes for the CRS and OFFR table structures. As you can see, only unique identification constraints are involved.

Listing 7-32. Table Universe tab_CRS

tab_CRS :=
{ C | C∈℘(tup_CRS) ∧
      /* Course code uniquely identifies a tuple */
      ( ∀c1,c2∈C: c1(CODE) = c2(CODE) ⇒ c1 = c2 )
}

Listing 7-33. Table Universe tab_OFFR

tab_OFFR :=
{ O | O∈℘(tup_OFFR) ∧
      /* Course code and begin date uniquely identify a tuple */
      ( ∀o1,o2∈O:
        o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ⇒ o1 = o2 )
      ∧
      /* Begin date and (known) trainer uniquely identify a tuple */
      ( ∀o1,o2∈{ o | o∈O ∧ o(TRAINER) ≠ -1 }:
        o1↓{STARTS,TRAINER} = o2↓{STARTS,TRAINER} ⇒ o1 = o2 )
}

The second table constraint is sort of a special case of a unique identification constraint. The set {STARTS,TRAINER} is uniquely identifying within a subset of an OFFR table. This constraint states that a trainer cannot engage more than one course offering per start date. There can certainly be multiple course offerings starting at the same date that have no trainer assigned to them yet; hence the exclusion of offerings with no assigned trainer in the universal quantification.

imagesNote There is obviously a wider limitation here that involves the full duration of an offering: trainers cannot engage another offering for the duration of an assigned offering. This involves the duration of the corresponding course (attribute DUR in the CRS table structure). Therefore, it is a multi-table constraint and will be specified in the next phase (database universe).

Listing 7-34 defines table universe tab_REG for the REG table structure.

Listing 7-34. Table Universe tab_REG

tab_REG :=
{ R | R∈ ℘(tup_REG) ∧
      /* Attendee and begin date uniquely identify a tuple */
      ( ∀r1,r2∈R:
        r1↓{STARTS,STUD} = r2↓{STARTS,STUD} ⇒ r1 = r2 )
      ∧
      /* Offering is evaluated by all attendees, or it is too early to */
      /* evaluate the offering */
      ( ∀r1,r2∈R:
        ( r1↓{COURSE,STARTS} = r2↓{COURSE,STARTS} )
          ⇒
        ( ( r1(EVAL) = -1 ∧ r2(EVAL) = -1 ) ∨
          ( r1(EVAL) ≠ -1 ∧ r2(EVAL) ≠ -1 )
      ) )
}

Do you notice that the first table constraint does not state {COURSE,STARTS,STUD} is uniquely identifying with a REG table? Considering that REG is a “child table” of CRS, you would typically expect COURSE to be part of the set of attributes. However, in the real world we don’t allow students to register for two offerings that start at the same date, whatever courses are involved in these offerings. Therefore, the reduced set {STARTS,STUD} is uniquely identifying in a REG table. Here too there is a wider limitation involving the duration of a course that will be specified in the definition of the database universe later on.

The second table constraint states that within the registrations of a course offering, either all evaluations (still) hold value -1, or all of them hold a value that differs from -1. Note that (quoting from the embedded comment) “evaluated by all attendees” includes the special value 0. The idea here is that near the end of the course offering, the trainer explicitly changes—in a single transaction—all evaluation values from -1(too early to evaluate) to 0 (not evaluated). The students then evaluate the offering individually, each in their own transaction. As you’ll see in the next chapter, you can formally specify this intended behavior through a dynamic (state transition) constraint.

Listing 7-35 defines the last table universe (tab_HIST) regarding the HIST table structure.

Listing 7-35. Table Universe tab_HIST

tab_HIST :=
{ H | H∈ ℘(tup_HIST) ∧
      /* Employee number and history end date uniquely identify a tuple */
      ( ∀h1,h2∈H: h1↓{EMPNO,UNTIL} = h2↓{EMPNO,UNTIL} ⇒ h1 = h2 )
      ∧
      /* Either department number or monthly salary (or both) */
      /* must have changed between two consecutive history records */
      ( ∀h1,h2∈H:
        ( h1(EMPNO) = h2(EMPNO) ∧
          h1(UNTIL) < h2(UNTIL) ∧
          ¬ ( ∃h3∈T: h3(EMPNO) = h1(EMPNO) ∧
                     h3(UNTIL) > h1(UNTIL) ∧
                     h3(UNTIL) < h2(UNTIL) )
        ) ⇒
        ( h1(MSAL) ≠ h2(MSAL) ∨ h1(DEPTNO) ≠ h2(DEPTNO) )
      )
}

The first (unique identification) table constraint states that there can be at most one history record per employee per date. The second table constraint states that two consecutive history records (for the same employee) cannot have both an unchanged monthly salary and an unchanged department number; at least one of these attributes must have a different value. The way the table predicate specifies this is by universally quantifying over all pairs of HIST tuples (h1 and h2); if such a pair involves two consecutive HIST tuples for the same employee, then either the MSALor the DEPTNO attribute values must differ. The fact that two HIST tuples are consecutive is specified by the existential quantification: there should not exist another (third) HIST tuple (h3) for the same employee that “sits in between” the other two tuples.

Again, the formal specification for this constraint leaves no room for ambiguity.

This concludes the treatment of the table universes of the example database design. The next section moves on the definition of the database universe. This definition will build on the table universes and will formally specify the database constraints.

Database Universe

You have now reached the last stepping stone (phase) of this book’s formal methodology to specify a database design. As with the other phases, it continues to build on the sets specified in the previous phase.

You can use the table universes to build a set of admissible database states (we’ll demonstrate this shortly). Such a set is called a database universe. Every element in the database universe is an admissible database state for the database design.

At the beginning of this chapter, we said, “Every database state is an admissible set of n tables (one per table structure).” Of course, this was loosely speaking. The way you formally represent a database state was introduced in Chapter 5; it is a function where every first element of a pair is a table structure name and every second element of a pair is a table. As you’ll see, this way of representing a database state is convenient; we can easily construct functions such as this using the already defined table universes.

Let’s illustrate this with the RESULT table structure that was used in the previous sections. Because this section is about database constraints, we first introduce a second table structure to go along with the RESULT table structure. Let’s assume the existence of a LIMIT table structure. Here’s the external predicate for LIMIT: “Average score SCORE is not allowed for population POPULATION.” The idea is that the tuples in LIMIT limit the admissible tables for RESULT. We’ll specify precisely how they do so later on, by a database constraint.

Listing 7-36 specifies characterization chr_LIMIT, tuple universe tup_LIMIT, and table universe tab_LIMIT.

Listing 7-36. Specifications for chr_LIMIT, tup_LIMIT, and tab_LIMIT

chr_LIMIT :=
{ ( POPULATION; {'DP','NON-DP'} )
, ( SCORE;      {'A','F'}       )
}
tup_LIMIT :=
{ l | l∈Π(chr_LIMIT) ∧
      l(POPULATION) = 'DP' ⇒ l(SCORE) = 'A' ∧
      l(POPULATION) = 'NON-DP' ⇒ l(SCORE) = 'F'
}
tab_LIMIT :=
{ L | L∈℘(tup_LIMIT) ∧
      ( ∀l1,l2∈L: l1(POPULATION) = l2(POPULATION) ⇒ l1 = l2 )
}

The tuple and table universes for LIMIT are rather small sets. Here is an enumerative specification for both of them:

tup_LIMIT = { { (POPULATION;'DP')    , (SCORE;'A') }
            , { (POPULATION;'NON-DP'), (SCORE;'F') } }
tab_LIMIT = { Ø
            , { { (POPULATION;'DP')    , (SCORE;'A') } }
            , { { (POPULATION;'NON-DP'), (SCORE;'F') } }
            , { { (POPULATION;'DP')    , (SCORE;'A') }
              , { (POPULATION;'NON-DP'), (SCORE;'F') } }
            }

As you can see, there are only four admissible LIMIT tables: the empty table, two tables with one tuple, and one table with two tuples.

imagesNote The unique identification table constraint in the specification of tab_LIMIT is in fact redundant. It is implied by the tuple constraints and the characterization.

Figure 7-4 displays tab_LIMIT using the shorthand notation for tables.

images

Figure 7-4. Enumerative specification of tab_LIMIT using table shorthand notation

Given these two table structures, we can now take a look at a database state. Figure 7-5 displays a database state, named DBS1, that involves the RESULTand LIMIT table structures.

images

Figure 7-5. Database state DBS1

As you can see, it is a function that holds two pairs. The first ordered pair holds—as the second coordinate—an admissible RESULT table (that is, an element of tab_RESULT). Likewise, the second ordered pair holds an admissible LIMIT table (that is, an element of tab_LIMIT). These tables are accessible via their table structure names: the values appearing as the first coordinates, RESULT and LIMIT. Using the definition of DBS1, you can write expressions such as DBS1(RESULT)and DBS1(LIMIT), which yield the displayed RESULTand LIMITtables, respectively.

Now take a look at the following set function named DBCHR:

DBCHR := { (RESULT; tab_RESULT), (LIMIT; tab_LIMIT) }

Set function DBCHR is called a database characterization. It characterizes a database in that it introduces the names of the table structures that are involved in a database design and attaches to them the relevant table universes (the sets that hold the admissible tables for each table structure).

imagesNote Database state DBS1 is an element of Π(DBCHR). DBS1 has the same domain as function DBCHR, and every second coordinate of DBS1 is an element chosen from the respective set (tab_RESULT and tab_LIMIT) that appears as the second coordinate in DBCHR.

You can construct a set that holds all possible database states for the result/limit database design, by taking the generalized product of set function DBCHR:

DB_U1 := { dbs | dbs∈Π(DBCHR) }

In set DB_U1, every element dbs is a possible database state given the table universes tab_RESULT and tab_LIMIT. The generalized product operator generates every possible combination of a RESULT table and a LIMIT table; there are 52 combinations in total (#tab_RESULT times #tab_LIMIT).

Now, finally database constraints enter the picture. You can restrict set DB_U1 by adding database predicates inside the specification. Take a look at Listing 7-37, which introduces set DB_U2.

Listing 7-37. Database Universe DB_U2

DB_U2 :=
{ dbs | dbs∈Π(DBCHR)∧
          ( ∀r∈db(RESULT),l∈db(LIMIT):
               r(POPULATION) = l(POPULATION) ⇒ r(AVG_SCORE) ≠ l(SCORE) ) }

The specification of DB_U2 contains one database predicate. It states that if there is a limit regarding database professionals, then there cannot be a result with an average score of A for database professionals (regardless of the course), and if there is a limit regarding non-database professionals, then there cannot be a result with an average score of F for non-database professionals.

This database predicate discards possible database states that were generated by the generalized product operator, but that do not reflect admissible representations of our real world. Database predicates that constrain the elements in a database universe are called database constraints.

imagesNote Given the database constraint of universe DB_U2, state DBS1 shown in Figure 7-5 is not an admissible state (that is, DB_U2∉DBS1); the RESULT table holds a tuple that has an average score of A for database professionals. This is not allowed given the contents of the LIMIT table in database state DBS1. In universe DB_U2, the database constraint will discard a total of 26 (possible) database states. You might want to verify this for yourself.

We’ll often specify the database constraints in a slightly different manner. Instead of specifying a database predicate with one parameter of type database state, we’ll specify them with two or more parameters of type table. To illustrate this, here is an alternative manner of defining database universe DB_U2:

DB_U2 := { dbs | dbs∈Π(DBCHR) ∧ PDC1(dbs(RESULT),dbs(LIMIT)) }

Inside this definition, we instantiate a predicate named PDC1 that accepts two tables as its arguments. It is defined as follows (R and L represent parameters of type table):

PDC1(R,L) := ( ∀r∈R,l∈L:
                  r(POPULATION) = l(POPULATION) ⇒ r(AVG_SCORE) ≠ l(SCORE)
             )

These two definitions together form an equivalent specification to the one given in Listing 7-37. This way of specifying shows immediately how many tables are involved in each database constraint, and inside the database predicate you can use the table parameters directly instead of having to invoke the database state with a certain table structure name.

Let’s now continue with the example database design and define its database universe, thereby formally specifying all database constraints. We start by defining the database characterization. Take a look at Listing 7-38, which holds the definition for DB_CHREX.

Listing 7-38. Database Characterization DB_CHREX

DB_CHREX :=
{ ( EMP;  tab_EMP  )
, ( SREP; tab_SREP )
, ( MEMP; tab_MEMP )
, ( TERM; tab_TERM )
, ( DEPT; tab_DEPT )
, ( GRD;  tab_GRD  )
, ( CRS;  tab_CRS  )
, ( OFFR; tab_OFFR )
, ( REG;  tab_REG  )
, ( HIST; tab_HIST )
}

Set function DB_CHREX introduces ten table aliases and attaches to each of them the relevant table universe that was specified in the previous phase. The generalized product of DB_CHREX will produce a set with many possible database states. Our database universe DB_UEX, which is defined in Listing 7-39, establishes 36 database constraints. They are specified by name only in this definition. In the sections that follow, you’ll find the formal specification for each of these named constraints.

Listing 7-39. Database Universe DB_UEX

DB_UEX :=
{ dbs | dbs∈Π(DB_CHREX) ∧
   /* ======================================= */
   /* Start of Subset Requirement Constraints */
   /* ======================================= */
   PSSR1(dbs(EMP),dbs(DEPT))  ∧ PSSR2(dbs(DEPT),dbs(EMP))   ∧
   PSSR3(dbs(MEMP),dbs(EMP))  ∧ PSSR4(dbs(TERM),dbs(EMP))   ∧
   PSSR5(dbs(EMP),dbs(GRD))   ∧ PSSR6(dbs(OFFR),dbs(CRS))   ∧
   PSSR7(dbs(OFFR),dbs(DEPT)) ∧ PSSR8(dbs(OFFR),dbs(EMP))   ∧
   PSSR9(dbs(REG),dbs(EMP))   ∧ PSSR10(dbs(REG),dbs(OFFR))  ∧
   PSSR11(dbs(HIST),dbs(EMP)) ∧ PSSR12(dbs(HIST),dbs(DEPT)) ∧
   /* =================================== */
   /* Start of Specialization Constraints */
   /* =================================== */
   PSPEC1(dbs(EMP),dbs(SREP)) ∧ PSPEC2(dbs(EMP),dbs(MEMP))  ∧
   /* ================================== */
   /* Start of Tuple-in-Join Constraints */
   /* ================================== */
   
PTIJ1(dbs(EMP),dbs(GRD))            ∧ PTIJ2(dbs(EMP),dbs(TERM))            ∧
   PTIJ3(dbs(SREP),dbs(EMP),dbs(MEMP)) ∧ PTIJ4(dbs(EMP),dbs(MEMP))            ∧
   PTIJ5(dbs(EMP),dbs(DEPT))           ∧ PTIJ6(dbs(EMP),dbs(HIST))            ∧
   PTIJ7(dbs(TERM),dbs(HIST))          ∧ PTIJ8(dbs(EMP),dbs(REG))             ∧
   PTIJ9(dbs(TERM),dbs(REG),dbs(CRS))  ∧
   PTIJ10(dbs(EMP),dbs(REG),dbs(OFFR),dbs(CRS)) ∧
   PTIJ11(dbs(EMP),dbs(OFFR))          ∧ PTIJ12(dbs(TERM),dbs(OFFR),dbs(CRS)) ∧
   PTIJ13(dbs(REG),dbs(OFFR))          ∧ PTIJ14(dbs(OFFR),dbs(CRS))           ∧
   PTIJ15(dbs(EMP),dbs(REG),dbs(OFFR),dbs(CRS))                               ∧
   /* =================================== */
   /* Start of Other Database Constraints */
   /* =================================== */
   PODC1(dbs(TERM),dbs(MEMP))                   ∧ PODC2(dbs(TERM),dbs(DEPT)) ∧
   PODC3(dbs(OFFR),dbs(DEPT),dbs(EMP),dbs(CRS)) ∧ PODC4(dbs(OFFR),dbs(REG))  ∧
   PODC5(dbs(OFFR),dbs(REG))                    ∧ PODC6(dbs(OFFR),dbs(REG))  ∧
   PODC7(dbs(OFFR),dbs(REG),dbs(EMP))
}

As you’ll see, the majority of the database constraints are based on one of the common types of predicates that were introduced in the section “Common Patterns of Table and Database Predicates” in Chapter 6; there are 12 subset requirement, two specialization, and 15 tuple-in-join constraints in this database universe definition. Finally, there are seven “other” (that is, not of a common type) database constraints. Most database constraints only involve a pair of table structures; however, some tuple-in-join and “other” database constraints involve more than two table structures.

In the remaining part of this section, you’ll find the formal specification of all database constraints, whose names were introduced in the definition of DB_UEX. For our convenience, the names for the table parameters in each of these predicates are the same as the names of the relevant table structures that are involved in the predicate. As you can see inside the specification of DB_UEX, the database predicates are instantiated by providing the relevant tables as the arguments.

Listing 7-40 shows database predicates PSSR1 through PSSR12, which represent the subset requirements in the example database design.

Listing 7-40. The Subset Requirements of DB_UEX

PSSR1(EMP,DEPT) :=
   /* Employee works for a known department */
   { e(DEPTNO) | e∈EMP } ⊆ { d(DEPTNO) | d∈DEPT }
PSSR2(DEPT,EMP) :=
   /* Dept mgr is a known employee, excluding admins and president */
   { d(MGR)   | d∈DEPT } ⊆
   { e(EMPNO) | e∈EMP ∧ e(JOB) ∉ {'ADMIN','PRESIDENT'} }
PSSR3(MEMP,EMP) :=
   /* Employees can report to a president or a manager only */
   { m(MGR)   | m∈MEMP } ⊆
   { e(EMPNO) | e∈EMP ∧ e(JOB) ∈ {'PRESIDENT','MANAGER' } }
PSSR4(TERM,EMP) :=
   /* A termination is for a known employee; not everyone has left */
   { t(EMPNO) | t∈TERM } ⊂ { e(EMPNO) | e∈EMP }
PSSR5(EMP,GRD) :=
   /* Employee has a known salary grade */
   { e(SGRADE) | e∈EMP } ⊆ { g(GRADE) | g∈GRD }
PSSR6(OFFR,CRS) :=
   /* Course offering is for a known course */
   { o(COURSE) | o∈OFFR } ⊆ { c(CODE) | c∈CRS }
PSSR7(OFFR,DEPT) :=
   /* Courses take place in locations where we have a department */
   { o(LOC) | o∈OFFR } ⊆ { d(LOC) | d∈DEPT }
PSSR8(OFFR,EMP) :=
   /* Trainer of course offering is a known trainer */
   { o(TRAINER) | o∈OFFR ∧ o(TRAINER) ≠ -1 } ⊆
   { e(EMPNO) | e∈EMP ∧ e(JOB) = 'TRAINER' }
PSSR9(REG,EMP) :=
   /* Course registration is for a known employee */
   { r(STUD) | r∈REG } ⊆{ e(EMPNO) | e∈EMP }
PSSR10(REG,OFFR) :=
   /* Course registration is for a known course offering */
   { r↓{COURSE,STARTS} | r∈REG } ⊆ o↓{COURSE,STARTS} | o∈OFFR }
PSSR11(HIST,EMP) :=
   /* History record is for a known employee */
   { h(EMPNO) | h∈HIST } ⊆ { e(EMPNO)} | e∈EMP }
PSSR12(HIST,DEPT) :=
   /* History record is for a known department */
   { h(DEPTNO) | h∈HIST } ⊆ { d(DEPTNO)} | d∈DEPT }

Most subset requirements are straightforward; PSSR1, 5, 6, 9, 10, 11, and 12 all state that every tuple in the first argument is related to some tuple in the second argument.

You’ll probably find PSSR7 a bit unusual. It states that {LOC} in OFFR references {LOC} in DEPT. Usually, given such a subset requirement, {LOC} would be uniquely identifying in DEPT; however, this is not the case here. This means that an OFFR tuple might be related to more than one DEPT tuple.

imageNote Apparently the example database design does not need to hold information with regards to locations, other than the name of a location. If that were the case, then a separate table structure, say with alias LOC, would have been introduced to hold location information. Also, two subset requirements, one from DEPT to LOC and one from OFFR to LOC, would have been introduced.

Predicates PSSR2 and PSSR3 state that every tuple in the first parameter value is related to some tuple in a subset of the second parameter value; the set specified at the right-hand side denotes a table restriction in these cases.

Predicate PSSR8 states that every tuple in a subset of the first parameter value is related to some tuple in a subset of the second parameter value; the sets specified at both sides denote table restrictions.

Finally, you might have noticed that inside predicate PSSR4 the proper subset operator is used; this ensures that not every employee has been terminated (at least one employee still works for the company).

You can find the definitions for the specialization constraints PSPEC1 and PSPEC2 in Listing 7-41.

Listing 7-41. The Specializations of DB_UEX

PSPEC1(EMP,SREP) :=
   /* Salesreps have a target and a commission */
   { e(EMPNO) | e∈EMP ∧ e(JOB) = 'SALESREP' } =
   { s(EMPNO) | s∈SREP }
PSPEC2(EMP,MEMP) :=
   /* Everybody, excluding presidents, is a managed employee */
   { e(EMPNO) | e∈EMP ∧ e(JOB) ≠ 'PRESIDENT' } =
   { m(EMPNO) | m∈MEMP }

Predicate PSPEC1 specifies that you can find additional information for sales representatives (and no other employees) in the SREP table structure; only sales representatives have a target and a commission. Predicate PSPEC2 specifies that for all employees excluding presidents, additional information can be found in the MEMP table structure; here the employee number of the employee that acts as the manager is maintained.

imageNote We say that SREP and MEMP are specializations of EMP.

Listing 7-42 shows database predicates PTIJ1 through PTIJ15 that represent the tuple-in-join constraints in the example database design.

Listing 7-42. The Tuple-in-Join Constraints of DB_UEX

PTIJ1(EMP,GRD) :=
   /* Monthly salary must fall within assigned salary grade */
    ( ∀e∈EMP, g∈GRD:
       e(SGRADE) = g(GRADE) ⇒ ( g(LLIMIT) ≤ e(MSAL) ∧ e(MSAL) ≤ g(ULIMIT) ) )
PTIJ2(EMP,TERM) :=
/* Leave date must fall after hire date */
    ( ∀e∈EMP, t∈TERM:
       e(EMPNO) = t(EMPNO) ⇒ e(HIRED) < t(LEFT) )
PTIJ3(SREP,EMP,MEMP) :=
   /* Salesreps cannot earn more than the employee they report to */
    ( ∀s∈SREP, es∈EMP, m∈MEMP, em∈EMP:
     ( s(EMPNO)=es(EMPNO) ∧ es(EMPNO)=m(EMPNO) ∧ m(MGR) = em(EMPNO) )
       ⇒
     ( es(MSAL) + s(COMM)/12 < em(MSAL) ) )
PTIJ4(EMP,MEMP) :=
   /* Non-salesreps cannot earn more than the employee they report to */
    ( ∀e∈EMP, m∈MEMP, em∈EMP:
     (e(EMPNO)=m(EMPNO) ∧ m(MGR) = em(EMPNO) ∧ e(JOB) ≠ 'SALESREP')
       ⇒
     ( e(MSAL) < em(MSAL) ) )
PTIJ5(EMP,DEPT) :=
   /* Department manager must work for department he/she manages */
    ( ∀e∈EMP, d∈DEPT:
       e(EMPNO) = d(MGR) ⇒ e(DEPTNO) = d(DEPTNO) )
PTIJ6(EMP,HIST) :=
   /* No history records allowed at or before hire date */
   ( ∀e∈EMP, h∈HIST:
      e(EMPNO) = h(EMPNO) ⇒ e(HIRED) < h(UNTIL) )
PTIJ7(TERM,HIST) :=
   /* No history records allowed at or after leave date */
    ( ∀t∈TERM, h∈HIST:
       t(EMPNO) = h(EMPNO) ⇒ t(LEFT) > h(UNTIL) )
PTIJ8(EMP,REG) :=
   /* You cannot register for offerings in 1st four weeks on the job */
    ( ∀e∈EMP, r∈REG:
       e(EMPNO) = r(STUD) ⇒ e(HIRED) + 28 ≤ r(STARTS) )
PTIJ9(TERM,REG,CRS) :=
   /* You cannot register for offerings given at or after leave date */
    ( ∀t∈TERM, r∈REG, c∈CRS:
     ( t(EMPNO) = r(STUD) ∧ r(COURSE) = c(CODE) )
       ⇒
     ( t(LEFT) ≥ r(STARTS) + c(DUR) ) )
PTIJ10(EMP,REG,OFFR,CRS) :=
   /* You cannot register for overlapping course offerings */
    ( ∀e∈EMP, r1,r2∈REG, o1,o2∈OFFR, c1,c2∈CRS:
     ( e(EMPNO)           = r1(STUD)           ∧
       r1↓{COURSE,STARTS} = o1↓{COURSE,STARTS} ∧
       o1(COURSE)         = c1(CODE)           ∧
       e(EMPNO)           = r2(STUD)           ∧
       r2↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∧
       o2(COURSE)         = c2(CODE)
     ) ⇒
     ( o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
       o1(STARTS) ≥ o2(STARTS) + c2(DUR)       ∨
       o2(STARTS) ≥ o1(STARTS) + c1(DUR) ) )
PTIJ11(EMP,OFFR) :=
   /* Trainer cannot teach courses before hire date */
    ( ∀e∈EMP, o∈OFFR:
       e(EMPNO) = o(TRAINER) ⇒ e(HIRED) ≤ o(STARTS) )
PTIJ12(TERM,OFFR,CRS) :=
   /* Trainer cannot teach courses at or after leave date */
    ( ∀t∈TERM, o∈OFFR, c∈CRS:
     ( t(EMPNO) = o(TRAINER) ∧ o(COURSE) = c(CODE) )
       ⇒
     ( t(LEFT) ≥ o(STARTS) + c(DUR)                ) )
PTIJ13(REG,OFFR) :=
   /* Trainer cannot register for offerings taught by him/herself */
    ( ∀r∈REG, o∈OFFR:
       r↓{COURSE,STARTS} = o↓{COURSE,STARTS} ⇒
       r(STUD) ≠ o(TRAINER) )
PTIJ14(OFFR,CRS) :=
   /* Trainer cannot teach different courses simultaneously */
    ( ∀o1,o2∈OFFR, c1,c2∈CRS:
     ( o1(TRAINER) = o2(TRAINER) ∧
       o1(COURSE)  = c1(CODE)    ∧
       o2(COURSE)  = c2(CODE)
     ) ⇒
     ( o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
       o1(STARTS) ≥ o2(STARTS) + c2(DUR)       ∨
       o2(STARTS) ≥ o1(STARTS) + c1(DUR) ) )
PTIJ15(EMP,REG,OFFR,CRS) :=
   /* Employee cannot register for course offerings that overlap */
   /* with another course offering where he/she is the trainer */
    ( ∀e∈EMP, r∈REG, o1,o2∈OFFR, c1,c2∈CRS:
     ( e(EMPNO)   = r(STUD)                    ∧
       r↓{COURSE,STARTS} = o1↓{COURSE,STARTS} ∧
       o1(COURSE) = c1(CODE)                   ∧
       e(EMPNO)   = o2(TRAINER)                ∧
       o2(COURSE) = c2(CODE)
     ) ⇒
     ( o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
       o1(STARTS) ≥ o2(STARTS) + c2(DUR)       ∨
       o2(STARTS) ≥ o1(STARTS) + c1(DUR) ) )

Predicates PTIJ1 to PIJ15 all limit the tuples that are allowed in some join. Most of these joins involve only two table structures. Five predicates (PTIJ3, PTIJ9, PTIJ10, PTIJ12, PTIJ15) limit the tuples in joins that involve more than two table structures. Let’s take a closer look at those five.

Predicate PTIJ3 joins an SREP tuple (variable s) with its related EMP tuple (variable es), continues to join this EMP tuple with its related MEMP tuple (variable m), and finally joins this MEMP tuple with its related EMP tuple (variable em). All tuples that are in this join across four tables (note that EMP is joined twice) are constrained by the predicate es(MSAL) + s(COMM)/12 < em(MSAL). This states that the monthly salary of the sales representative, increased with a twelfth of his/her commission, should be less than the monthly salary of the employee that is managing this sales representative. By the way, this managing employee cannot be a sales representative—which might have required adding a twelfth of a commission at the right-hand side of the smaller-than symbol too—due to subset requirement PSSR3.

Predicate PTIJ9 joins a TERM tuple (t) directly with a REG tuple (r) and continues to join the REG tuple with its related CRS tuple (c).

imageNote A few notes:

The join from TERM to REG is not done “via EMP.” Because TERM is a specialization of EMP, you can join TERM directly to any other table structure that EMP can join to using the EMPNO attribute.

The join from REG to CRS is not done “via OFFR.” From the two subset requirements PSSR6 and PSSR10, you can deduce that {CRS} in REG references {CODE} in CRS (a subset requirement between REG and CRS), and you can thus join REG directly to CRS in the way shown.

Tuples in this join are constrained by the predicate t(LEFT) ≥ r(STARTS) + c(DUR). Do you notice that this formal specification somewhat differs from the embedded informal comment? The formal specification involves the CRS table structure too, or rather its DUR attribute. Here’s a more correct informal explanation of this constraint: an employee cannot register for a course offering that starts at or after his or her leave date, nor can an employee be terminated during a course offering that he or she is attending as a student.

Now you’re going down a path that is often hard to avoid when you try to describe database constraints informally: you start talking in terms of transactions that are not allowed given the constraint. This is what just happened earlier: you cannot insert an OFFR tuple given a certain TERM tuple, and vice versa, you cannot insert a TERM tuple given a certain OFFR tuple. With the increasing scope of data covered by a constraint, it becomes increasingly more difficult to informally—using a natural language—state the what of a constraint without alienating yourself from your customer. You often must dive into the how of a constraint. In your communication with the end users, this is not a bad thing; you start communicating in terms of how the system will behave (what transactions it will not accept given the constraint), and end users typically understand this. Your formal specification is the basis from which you can generate this behavior (we will deal with this in detail in Chapter 11). For instance, you might want to briefly mention to the user that given this constraint, the duration of a course cannot be increased (updated) given certain circumstances.

Predicate PTIJ10 joins an EMP tuple with two related REG tuples (r1 and r2), continues to join each of these two REG tuples with their related OFFR tuple (o1 and o2), and finally joins the two OFFR tuples with their related CRS tuple (c1 and c2). Tuples in this join are constrained by the following predicate:

o1↓{COURSE,STARTS} = o2↓{COURSE,STARTS} ∨
o1(STARTS) ≥ o2(STARTS) + c2(DUR)       ∨
o2(STARTS) ≥ o1(STARTS) + c1(DUR)

This predicate states that if the join doesn’t involve the same two OFFR tuples, then offering o1 must start after offering o2 ends, or vice versa.

Predicate PTIJ12 is the “trainer” version of PTIJ9. Instead of joining TERM via REG (using the STUD attribute) to CRS, predicate PTIJ12 joins TERM via OFFR (using the TRAINER attribute) to CRS and limits the resulting tuples in the join in the same way.

Predicate PTIJ15 joins an EMP tuple to a CRS tuple via REG and OFFR (representing the student that attends a course offering), and continues to join that same EMP tuple to a CRS tuple via OFFR (representing the trainer that gives an offering). It then restricts the tuples in this join in the same way as predicate PTIJ10 does.

You might have noticed a correlation between predicates PTIJ15 and PTIJ13; we will deal with this in an exercise at the end of this chapter.

Listing 7-43 shows database predicates PODC1 through PODC7, which represent the remaining seven other database constraints in the example database design.

Listing 7-43. The “Other” Database Constraints of DB_UEX

PODC1(TERM,MEMP) :=
   /* Active employee cannot be managed by terminated employee */
   { t1(EMPNO) | t1∈TERM } ∩
   { m(MGR) | m∈MEMP ∧
              ¬ ( ∃t2∈TERM: t2(EMPNO) = m(EMPNO) ) } = Ø
PODC2(TERM,DEPT) :=
   /* Department cannot be managed by a terminated employee */
   { t(EMPNO) | t∈TERM } ∩ { d(MGR) | d∈DEPT } = Ø
PODC3(OFFR,DEPT,EMP,CRS) :=
   /* At least half of the course offerings taught by a trainer */
   /* must be at home base */
    ( ∀e1∈{ o1(TRAINER) | o1∈OFFR ∧ o1(STATUS) ≠ 'CANC' }:
     ( Σt∈{ o2∪c2| d2∈DEPT ∧ e2∈EMP ∧ o2∈OFFR ∧ c2∈CRS ∧
                    e2(EMPNO)  = e1          ∧
                    e2(EMPNO)  = o2(TRAINER) ∧
                    e2(DEPTNO) = d2(DEPTNO)  ∧
                    o2(COURSE) = c2(CODE)    ∧
                    o2(STATUS) ≠ 'CANC'      ∧
                    o2(LOC)    = d2(LOC)
          } : t(DUR)
    ) ≥

    ( Σt∈{ o3∪c3| d3∈DEPT ∧ e3∈EMP ∧ o3∈OFFR ∧ c3∈CRS ∧
                    e3(EMPNO)  = e1          ∧
                    e3(EMPNO)  = o3(TRAINER) ∧
                    e3(DEPTNO) = d3(DEPTNO)  ∧
                    o3(COURSE) = c3(CODE)    ∧
                    o3(STATUS) ≠ 'CANC'      ∧
                    o3(LOC)    ≠ d3(LOC)
           } : t(DUR) ) )
PODC4(OFFR,REG) :=
   /* Offerings with 6+ registrations must have status confirmed */
    ( ∀o∈OFFR:
     #{ r | r∈REG ∧
            r↓{COURSE,STARTS} = o↓{COURSE,STARTS} } ≥ 6
     ⇒
     o(STATUS) = 'CONF' )
PODC5(OFFR,REG) :=
    /* Number of registrations cannot exceed maximum capacity of offering */
     ( ∀o∈OFFR:
      #{ r | r∈REG ∧
             r↓{COURSE,STARTS} = o↓{COURSE,STARTS} } ≤ o(MAXCAP) )
PODC6(OFFR,REG) :=
   /* Canceled offerings cannot have registrations */
    ( ∀o∈OFFR: o(STATUS) = 'CANC'
               ⇒
               ¬( ∃r∈REG: r↓{COURSE,STARTS} = o↓{COURSE,STARTS} ) )
PODC7(OFFR,REG,EMP) :=
   /* You are allowed to teach a certain course only if: */
   /* 1. You have been employed for at least one year, or */
   /* 2. You have attended that course first and the trainer of that */
   /*    course offering attends your first teach as participant */
    ( ∀o1∈OFFR:
     /* If this is the 1st time this trainer gives this course ... */
     ( ¬∃o2∈OFFR:
         o1↓{COURSE,TRAINER} = o2↓{COURSE,TRAINER} ∧
         o2(STARTS) < o1(STARTS)
     ) ⇒
   (/* then there should be an attendee in the classroom ... */
      ( ∃r1∈REG:
          r1↓{COURSE,STARTS} = o1↓{COURSE,STARTS} ∧
          /* who has given this course at an earlier date ... */
          ( ∃o3∈OFFR:
            o3(TRAINER) = r1(STUD)  ∧
            o3(COURSE) = o1(COURSE) ∧
            o3(STARTS) < o1(STARTS) ∧
      /* and *that* course was attended by the current trainer */
            ( ∃r2∈REG:
              o3↓{COURSE,STARTS} = r2↓{COURSE,STARTS} ∧

                  r2(STUD) = o1(TRAINER) )
          ) ) ) ∨
        /* or, this trainer has been employed for at least one year */
          ( ↵{ e(HIRED) | e∈EMP ∧ e(EMPNO) = o1(TRAINER) } <
            o1(STARTS) - 365 ) ) )

Predicate PODC1 states that the intersection of two sets should be the empty set. The first set holds all employee numbers of terminated employees. The second set holds all employee numbers of employees who manage an active (that is, not terminated) employee. By specifying that the intersection of these two sets is empty, you represent the informal requirement that an active employee cannot be managed by a terminated employee.

In the same way, predicate PODC2 represents the informal requirement that a department is managed by an active employee.

Predicate PODC3 universally quantifies over all employee numbers of employees that act as the trainer of some offering that has not been canceled. For each such employee number, it builds the set of (not canceled) OFFR tuples joined with their related CRS tuple, where the trainer of the offering equals the given employee (number), and the location of the offering is equal to the location of the department in which the given employee works. It then sums the DUR attribute values of all tuples in this set. The (summed) duration of offerings for the given trainer where the offering was given—at a location that differs from the trainer’s department location—is determined in the same way. For every given trainer, the former sum cannot be less than the latter sum.

Of course, this is not the way you communicate with your users. You stick to the English sentence provided by them (see the embedded comments in the code) and verify the meaning of the terminology that they use in that sentence. The verb “taught” in “taught by a trainer” clearly means that the offering must have status scheduled or confirmed; canceled offerings are not to be considered. It appears that counting offerings involves the duration of the offering. Finally, an offering is considered to be “at home base” if its location is the same as the location of the department that employs the trainer.

imagesNote Constraint PODC3 requires that attribute LOC of the OFFR table structure can (meaningfully) be compared to attribute LOC of the DEPT table structure. Constraint PSSR7 has been introduced to ensure exactly this requirement.

Predicate PODC4 states that offerings for which six or more students have registered must have status confirmed. It does this by inspecting the cardinality of the set that holds all registrations that are related to a given offering. If this cardinality is six or more, then the status of the given offering must be confirmed.

The structure of predicate PODC5 is similar to the structure of PODC6. It universally quantifies over all offerings, and requires that the number of related registrations for the given offering cannot exceed the maximum capacity of the given offering.

imagesNote The combination of predicates PODC4 and PODC5 imply that it would be useless to allow an offering to have a maximum capacity of less than six; such an offering could never reach status confirmed. This is exactly why the attribute value set of attribute MAXCAP starts at value six (see Listing 7-9).

Predicate PODC6 states that registrations for canceled offerings cannot exist. Consequently, whenever an offering gets canceled, all related registrations are to be removed.

Predicate PODC7 constitutes the pièce de résistance of our example database universe. It requires that the first-time offering of every course taught by a trainer satisfies a certain condition. If the given trainer of such an offering has not been employed for at least one year, then this given trainer must have attended a prior offering of the same course, and the trainer of that prior course offering must attend such a first-time offering taught by a given trainer. Predicate PODC7 explicitly holds additional comments to guide you through its formal specification.

When you try to read and understand a complex predicate such as PODC7, it’s useful to first study the top-down structure of the predicate. In this case, the structure of the predicate is as follows:

( ∀o1∈OFFR: P1(o1,OFFR) ⇒ ( P2(o1,OFFR,REG) ∨ P3(o1,EMP) )

Predicate PODC7 is a universal quantification over all offerings, and states that if some condition (P1) that involves the given offering and other offerings is TRUE, then a disjunction of two conditions (P2 and P3) should hold. Condition P1 states that the given offering is the first-time offering. Condition P2 involves the given offering, other offerings, and registrations; it states the complex requirement with regards to the prior offering that must have been attended by the trainer of the given offering. Condition P3 involves the given offering and employees; it states that the trainer of the given offering has been employed for at least one year.

imagesNote Constraint PODC7 implies that the very first offering given for every course must be given by a trainer that has been employed for at least one year. Evidently it takes at least a year to develop the material and reach the skill level necessary to offer a course.

This concludes the demonstration of how to formally specify a relational database design. You have seen how you can use set theory, in combination with logic, to produce a rock-solid database design specification. The method demonstrated also gives us a good and clear insight into the relevant constraints that play a role in the various layers of such a database design.

Chapter Summary

This section provides a summary of this chapter, formatted as a bulleted list. It provides a high-level overview of the material discussed in this chapter.

  • Database designs are often documented using the natural language. The major part of these design documents consists of the constraint specifications that will make the database design a satisfactory fit for the real world.
  • If you use plain English to express data integrity constraints, you will inevitably hit the problem of how English sentences map, unambiguously, into the table structures.
  • You can use set theory to formally specify a database design in a layered (phased) manner. By formalizing your database design specification, you will eliminate all ambiguity—especially the ambiguity surrounding the constraints.
  • In the first phase, you specify a set function for every table structure. This set function introduces the attribute names for the table structure and attaches the relevant attribute value set to each of them. These set functions are called characterizations.
  • In the second phase, you specify for every table structure a set of tuples called the tuple universe. It holds admissible tuples for the table structure. You can construct a tuple universe by taking the generalized product of a characterization and restricting this result by specifying tuple predicates. These tuple predicates are called tuple constraints.
  • In the third phase, you specify for every table structure a set of tables called the table universe. It holds admissible tables for the table structure. You can construct a table universe by taking the powerset of a tuple universe and restricting this result by specifying table predicates. These table predicates are called table constraints.
  • In the last phase, you specify a set of database states called the database universe. It holds the admissible database states for the database being specified. You can construct a database universe by first defining a database characterization. A database characterization arranges all table universes into a single set function. You can construct the database universe by taking the generalized product of the database characterization and restricting this result by specifying database predicates. These database predicates are called database constraints.
  • This formal model is your—the database professional’s—reference of the database design. Users should never be confronted with this formalism. You can use an external predicate, in conjunction with the tuple constraints, to convey the meaning of the attributes of each table structure. Often you should explain table and database constraints to users by talking about the implications they have on the behavior of the system. Finally, by using rewrite rules to rewrite the predicates that constitute the constraints, you can find alternative, informal ways to discuss the relevant constraints with users.

Exercises

  1. Rewrite table predicate P3 (see Listing 7-23) into a negation of an existential quantification.
  2. One of your colleagues suggests that the following tuple predicate should be added to the specification of tuple universe tup_OFFR:
    o(STATUS)='CONF' ⇒ o(TRAINER) ≠ -1

    What is your response?

  3. Modify the specification of table universe tab_MEMP so that it includes a constraint that states that no manager can directly manage more than ten employees.
  4. In table universe tab_GRD, there is currently no limit to the number of salary grades that are allowed to overlap with one another. Add a table constraint that ensures that a salary value falls within at most two salary grades.
  5. Write down database state DBS1 using the longhand notation for tables.
  6. Within the context of universe DB_U2, what are the admissible RESULT tables that can be combined with the following LIMIT table?
    { { (POPULATION;'DP'), (SCORE;'A') },
      { (POPULATION;'NON-DP'), (SCORE;'F') } }.
  7. Database constraint PTIJ5 implies that no employee can manage more than one department. This renders the third table constraint of table universe tab_DEPT useless: it can never be violated given PTIJ5.
    • Change the specification of the database constraint PTIJ5 such that if the employee manages two departments, then one of those must be the department in which the employee is employed.
    • Add a new constraint stating that if an employee manages two departments, then these two departments must be at the same location.
  8. Predicate PODC6 is a disguised tuple-in-join predicate. Can you provide the alternative formal specification that follows the structure of a tuple-in-join predicate?
  9. The example database design deliberately does not have a constraint stating that cycles are not allowed in the hierarchy of managed employees. This is because other constraints prevent such cycles from ever happening. Do you see which ones these are?
  10. Give an alternative formal specification of predicate PODC1 using quantifiers.
  11. The employee with employee number 1000 has a special status within the company. Take a look at the following database constraint with regards to this employee:
    PODC8(OFFR,REG) :=
      (∀o∈OFFR: (∃r∈REG: r↓{course,starts}=o↓{course,starts) ∧
                           r(stud)=1000)
                 ⇒
                ( #{ r2 | r2∈REG ∧ r2↓{course,starts}=o↓{course,starts)} = 1
                  ∧
                  ¬(∃o2∈OFFR: o2↓{loc,starts}=o↓{loc,starts} ∧
                              o2(course)≠o(course))
                )
    )

    This constraint states that whenever employee 1000 is attending an offering, then no other students are allowed to attend this offering, nor can there be any other offering (for a different course) starting at the same date and location as this offering. This constraint is actually a conjunction of two constraints, where one is a (disguised) table constraint and the other a (simpler) database constraint. Find out what these two constraints are by applying various rewrite rules introduced in Part 1 of this book.

  12. Take a look at the following database state, named db_empty:
    { ( EMP; Ø ), ( SREP; Ø ), ( MEMP; Ø ), ( TERM; Ø ), ( DEPT; Ø ),
      ( GRD; Ø ), ( CRS;  Ø ), ( OFFR; Ø ), ( REG;  Ø ), ( HIST; Ø ) }

    Is state db_empty an element of database universe DB_UEX?

  13. Extend the example database design with a BONUS table structure. Here’s the external predicate of this table structure: “The employee with employee number EMPNO has received a bonus of AMOUNT dollars in year YEAR.” The following constraints apply with regards to this new table structure:
    • The yearly bonus cannot exceed the upper limit of the salary grade of the employee.
    • An employee can only receive a bonus for full years of (active) employment.
    • Sales reps do not receive bonuses.

    Provide a characterization, tuple universe, and table universe for the BONUS table structure and (if necessary) modify the DB_UEX universe.

  14. The president has decided that the business rule “no two offerings of the same course can start on the same day” should be relaxed to “only if two offerings of the same course take place at different locations, are they allowed to start on the same day.” Analyze the impact of this change with regards to the data integrity constraints of the DB_UEX universe.
  15. With a slight change to predicate PTIJ15, you can have it include the limitation imposed by predicate PTIJ13; it then implies PTIJ13, which means you can get rid of PTIJ13. Do you see how to change PTIJ15?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.149.19