images

Chapter 2 laid down the basics of set theory. This fourth chapter will build on those basics and introduce some more set-theory concepts that will be used in Part 2 of this book as tools to define a database design formally—including all constraint specifications.

In the section “Binary Relations,” we’ll revisit ordered pairs and the Cartesian product to introduce the concept of a binary relation between two sets. As you’ll see, such a binary relation is in fact a subset of the Cartesian product of two sets.

The section “Functions” then introduces the important concept of a function. A function is a special kind of binary relation—one that conforms to certain properties. In this section we’ll also establish some terminology around the (set-theory) concept of a function.

The section “Operations on Functions” revisits the union, intersection, and difference set operators and takes a closer look at how these behave when applied to functions as their operands. You’ll also be introduced to a new (dyadic) set operator called limitation of a function.

This chapter then continues by introducing another important concept that we call a set function. A set function holds a special kind of ordered pair: one where the second coordinate of the pair is always a set. We’ll reveal the link that a set function has with specifying a database design: it can be used to characterize an external predicate. An external predicate describes something about the real world that we want to represent in a database. If a set function is used for this purpose, we’ll refer to it as a characterization. In the same section, you’ll be introduced to a new monadic set operator called the generalized product. The generalized product takes a set function—or rather a characterization—as its operand and produces a new set of functions. The concept of a generalized product is the second key stepping stone (next to the concept of a powerset that was introduced in Chapter 2) that will be used in Part 2 of this book when we’ll demonstrate how to define a database design formally.

We conclude this chapter with the notion of function composition, which will be applied in Chapter 5 (operations on tables) when we investigate the join in a formal way.

You’ll find a “Chapter Summary” at the end followed by an “Exercises” section. You’re strongly advised to spend sufficient time on these exercises before moving on to the other chapters.

Binary Relations

Before we start dealing with binary relations, take a look at this quote from Chapter 1:

The importance of data management being based on logic was envisioned by E. F. (Ted) Codd (1923–2003) in 1969, when he first proposed his famous relational model in the IBM research report “Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks.” The relational model for database management introduced in this research report has proven to be an influential general theory of data management and remains his most memorable achievement.

The word Relations, as used by E. F. Codd in the preceding quote in the title of his research report, created the name of the general theory we now refer to as the relational model of data.

imagesNote It is a common misconception that the word relational in “the relational model of data” refers to “relationships” (many-to-one, many-to-many, and so on) that can exist between different “entity types,” as you too probably have designed at some point using the pervasive design technique known as entity-relationship diagrams (ERDs). The word relational in “the relational model of data” has nothing to do with the R of ERD. The word refers to the mathematical concept of a (n-ary) relation, which is totally different from the “relationship” concept in ERD. The former is a mathematical—set-theory—concept; the latter eventually maps to certain cases of data integrity constraints (loosely speaking).

In this chapter, you’ll be introduced to the set-theory concept of a relation—the meaning used by E. F. Codd. More specifically, you’ll be introduced to a certain type of relation: a binary relation. We’ll deal with relationships in the ERD sense—as data integrity constraints—in Part 2 of this book.

Ordered Pairs and Cartesian Product Revisited

Let’s first recall a bit from Chapter 2. That chapter ended with the following two definitions regarding ordered pairs and the Cartesian product:

  • An ordered pair is a pair of elements, also referred to as the coordinates of the ordered pair. The notation is (a;b).
  • The Cartesian product of two given sets, say A and B, is defined as follows: A×B = { (a;b) | a∈A ∧ b∈B}

This definition shows that the result of the Cartesian product of two given sets is in fact a set of ordered pairs. It holds every possible combination of an element from the first set and one from the second set.

Listing 4-1 illustrates the Cartesian product with an example.

Listing 4-1. Example Cartesian Product

A := {X,Y,Z}
B := {1,2}
A×B = { (X;1), (X;2), (Y;1), (Y;2), (Z;1), (Z;2) }

The same section of Chapter 2 also listed the following properties:

a≠b ⇒ (a;b)≠(b;a)
(a;b)=(c;d) ⇔ (a=c ∧ b=d)
A×B=B×A ⇔ A=B

These properties illustrate the following facts:

  • The order of the coordinates of an ordered pair is important.
  • The Cartesian product is not a commutative operator.

This enables us to introduce the mathematical concept of a binary relation in the next section.

Binary Relations

A binary relation from set A to set B is defined as a subset of the Cartesian product A×B. It’s called a binary relation because it deals with two sets (A and B). Let’s start with a small example, based on the same two sets in Listing 4-1:

A := {X,Y,Z}
B := {1,2}

Take a look at the set R1 shown here:

R1 := { (X;1), (X;2), (Y;1), (Z;2) }

Note that set R1 does not hold every possible combination of an element from set A and one from set B; set R1 only holds four ordered pairs as its elements. Every ordered pair is also an element of the Cartesian product of sets A and B, as shown in Listing 4-1. Formally, this can be expressed as follows:

( ∀p∈R1: p∈A×B )

This makes R1 a subset of the Cartesian product of sets A and B, and therefore a binary relation from set A to set B. Definition 4-1 illustrates another way to define a binary relation and uses the important powerset operator that was introduced in Chapter 2.

imagesDefinition 4-1: Binary RelationR is a binary relation from set A to set B⇔ R∈℘(A×B)

The powerset of a given set holds every possible subset of that set. In the preceding case, the powerset holds every possible subset of the Cartesian product of set A and set B. Consequently, every element of this powerset is a binary relation from set A to set B.

imagesNote In the small example relation R1, every A value appears in the ordered pairs, as does every B value. This is not always true, of course. There are other binary relations from A to B where this is not the case.

You can draw a picture of a binary relation. This is done by visualizing (as in a Venn diagram) the two sets separately and connecting the paired elements of both sets with arrows. Figure 4-1 shows binary relation R1 as a picture, where the four arrows represent the ordered pairs.

images

Figure 4-1. Picture of binary relation R1

You can instantly see in the picture of binary relation R that two arrows depart from element X of set A, and only one arrow each departs from elements Y and Z. You can also immediately see that both elements 1 and 2 of set B have two arrows arriving. In the next section, a limit will be imposed on the number of arrows departing from each element to get to the definition of a function.

imagesNote Instead of declaring a binary relation to be from set A to set B, other textbooks might declare binary relations to be over set A and on set B.

Functions

A function is a binary relation that doesn’t have two elements that have the same first coordinate; or said in another way, in a function two distinct ordered pairs always have different first coordinates. From the picture point of view, this limits the number of arrows departing from an element of the set at the left-hand side to one arrow at most.

Furthermore, if every element in the set at the left-hand side has exactly one arrow departing, then we say that the binary relation is a function over this set (at the left-hand side).

Definition 4-2 illustrates how to define this additional limitation formally. It employs the π1 operator introduced in Chapter 2 (it selects the first coordinate of an ordered pair).

imagesDefinition 4-2: FunctionF is a function over set A into set B” ⇔

F∈℘(A×B) ∧ ( ∀p1,p2∈F: p1≠p2 ⇒ π1(p1)≠ π2(p2) ) ∧ {π1(p) | p∈F } = A

The second conjunct at the right-hand side of the equivalence in Definition 4-2 constitutes the additional limitation mentioned earlier: for all ordered pairs p1 and p2 that are in function F, if p1 and p2 are different pairs then they should hold different first coordinates. Put in another way (using the result of Exercise 4a of Chapter 1), if p1 and p2 have the same first coordinate then they must be the same pair. The last conjunct states that for F to be a function over A, every element of A must appear as a first coordinate.

Most of the time, we’re interested in the fact that the first coordinates of the function originate from set A (and much less that the second coordinates originate from set B). Instead of saying that “F is a function over A into B,” we say that “F is a function over A.”

Now take a look at the following examples:

F1 := { (X;1), (X;2), (Y;1), (Z;2) }
F2 := { (X;2), (Y;1), (Z;2) }
F3 := { (X;1), (Y;1), (X;1), (Z;1) }
F4 := { (Z;2) }
F5 := ⊘
F6 := { (empno;126), (ename;'Lex de Haan'), (born;'11-aug-1954') }

The first example F1 is not a function because it holds two distinct pairs—(X;1) and (X;2)—that have the same first coordinate X. The second binary relation F2 is indeed a function: no two distinct pairs have the same first coordinate. The third one F3 is also a function. Although two pairs are enumerated that have the same first coordinate (the first and the third pair), they are in fact the same pair. One of these pairs need not have been enumerated and can be left out without changing set F3. Example F4 is also a function, and the empty set (F5) is indeed a function too. It is a subset of the Cartesian product of any two sets (the empty set is a subset of every set). The additional limitation given in Definition 4-2 also holds. You might want to revisit the second rewrite rule in Table 3-2 for this. The last example binary relation F6 is a function too. Through this example, you can begin to see how functions can be used (as building blocks) to specify database designs. We’ll come back to this at the end of the section “Set Functions.”

We continue with the introduction of various terminologies around the set-theory concept of a function.

Domain and Range of Functions

The set of all first coordinates of a function, say F, is called the domain of that function; the notation is dom(F). The set of all second coordinates of a function is called the range of that function; the notation is rng(F). These two concepts are rather trivial; to find the domain or range of a function you simply enumerate all first or second coordinates of the ordered pairs of that function. Listing 4-2 lists the domains of the five functions F2 through F6 introduced in the preceding section.

Listing 4-2. Domains of the Example Functions

dom(F2) = dom(F3) = {X,Y,Z}
dom(F4) = {Z}
dom(F5) = ⊘
dom(F6) = {empno,ename,born}

Listing 4-3 gives the ranges of these five functions.

Listing 4-3. Ranges of the Example Functions

rng(F2) = {1,2}
rng(F3) = {1}
rng(F4) = {2}
rng(F5) = ⊘
rng(F6) = {126,'Lex de Haan','11-aug-1954'}

From a picture of a function, you can determine the domain and range by leaving out the elements that have no arrows departing from or arriving in them. Figure 4-2 shows a binary relation F7 from a set A := {A,B,C,D,E} to a set B := {1,2,3,4,5}. As you can see, it straight-forwardly represents a function (at most one arrow departs from the elements of set A) and dom(F7) = {A,C,D,E} and rng(F7) = {2,4}.

images

Figure 4-2. Binary relation F7 from set A to set B

We define a function to be over a set A and into a set B, if the following two properties hold:

  • dom(F) = A
  • rng(F) ⊆ B

Because F is a function over A into B, by definition the set of all first coordinates—the domain of F—is equal to A. And because every ordered pair in a function over A into B has an element of B as its second coordinate, the range of F will be a subset of set B; often it will be a proper subset of B.

imagesNote You probably have noticed that the set-theory concept of a domain introduced here is quite different from the usual meaning of the word “domain” in other database textbooks. In other textbooks, a domain usually represents the set of possible values—or if you will, the type—of a “column” in a “table.” In this book, there’s a similar concept for that meaning of the word domain (we’ll call this an attribute value set) that will be precisely defined in the section “Set Functions.”

Given that a function never has two pairs with the same first coordinate, you can unambiguously determine the range element that is paired to a given domain element of a function (remember from every domain element exactly one arrow is departing). Mathematicians have a special way of denoting that single range element for a given domain element. If F is a function and x is an element of dom(F), then F(x) (pronounce it as “F of x,” or “F applied to x”) denotes the range element that is paired to x in function F. This notation is called function application. We’ll illustrate this using function F7 introduced in Figure 4-2. Following is the formal specification of F7:

F7 := { (A;2), (C;2), (D;4), (E;4) }

Another valid way of specifying function F7 is by using the function application notation; just specify for each domain element what its range element is:

F7(A) = 2
F7(C) = 2
F7(D) = 4
F7(E) = 4

You might already be familiar with this notation from your mathematics classes in the past. Here’s a typical example specification for some function F:

F(x) := 3x2 − 2x + 1

Given function F, you’d write expressions such as F(2), which would in this case evaluate to 9 (three times four, minus two times two, plus one).

The set-theory specification of the same function F would be as follows (assuming we want to specify this function over set Z into set Z):

F := { (x; 3x2 − 2x + 1) | x ∈ Z }

Here we’ve used the substitutive method to specify function F in a set-theory way.

Identity Function

A function f is called the identity function over A if A is the domain of f, and for all elements x in set A, f of x equals x. We use the notation id(A) to represent the identity function over A. Here are a few examples:

id({1,2,3})            = { (1;1), (2;2), (3;3) }
id({deptno})           = { (deptno;deptno) }
id({course,startdate}) = { (course;course), (startdate;startdate) }

Subset of a Function

In this section, we’ll briefly investigate subsets of a function. Because a function is a set, you can construct a subset of a function. Listing 4-4 provides every (proper) subset of the function F2 := { (X;2), (Y;1), (Z;2) } that was introduced in the preceding section.

Listing 4-4. All Proper Subsets of Function F2

{ (X;2), (Y;1) }
{ (X;2), (Z;2) }
{ (Y;1), (Z;2) }
{ (X;2) }
{ (Y;1) }
{ (Z;2) }

As you can see, every proper subset of F2 is again a function. Indeed, this is a property that holds for every function. For a given function F, every proper subset of F will be a function too.

This concludes the introduction of set-theory functions. Because functions are an important concept that will be extensively used throughout the rest of this book, we offer a brief summary here.

  • A function is a binary relation and therefore a set of ordered pairs.
  • In a function every first coordinate appears only once.
  • You can use a function to specify a certain part of a database design.

The title of this chapter was deliberately chosen to be “Relations and Functions” because we felt you should be introduced to (at least) the mathematical concept of a binary relation. The more general concept of an n-ary relation (that is, not just a binary one) is a cornerstone concept introduced by E. F. Codd into the database field. The concept of a binary relation is sufficient for the mathematical method to specify database designs that will be developed in this book. Other textbooks deal with—and use in their methods—the concept of an n-ary relation. They also use n-ary relations differently than the way this book deals with a special case of the 2-ary relation: the function. To avoid confusion from here on, this book will explicitly not use the terms binary relation or relation anymore.

Operations on Functions

Because functions are sets, you can apply the set operators—union, intersection, and differ-ence—with functions as their operands. This section will explore the application of these set operators on functions.

Union, Intersection, and Difference

It’s always interesting for mathematicians to investigate the closure property of an operator. In the context of this chapter, we ask ourselves whether the union, intersection, and difference operators, which are closed on sets, are also closed on functions. Or put differently, when you apply these operators on two functions, will this result in another function?

Take a look at the following example, which introduces two functions G1 and G2, and illustrates the result of the union of those two functions:

G1 := { (a;1), (b;2), (c;3) }
G2 := { (a;2), (d;4) }
G1 ∪ G2 = { (a;1), (b;2), (c;3), (a;2), (d;4) }

This example illustrates that the union of two functions does not necessarily result in a function, because the result of the union of functions G1 and G2 holds two different ordered pairs that have the same first coordinate: (a;1) and (a;2).

Take a look at another example:

G3 := { (empno;999), (ename;'Toon Koppelaars'), (born;'14-feb-1965') }
G4 := { (empno;999), (occupation;'IT architect') }
G3∪G4 = { (empno;999), (ename;'Toon Koppelaars'),
             (born;'14-feb-1965'), (occupation;'IT architect') }

The result of the union of functions G3 and G4 does result in a function. This is because G3(empno) = G4(empno), and empno is the only domain element that both functions have in common. In the result of the union, you don’t repeat element (empno;999) twice.

It shouldn’t be too difficult now to conceive of the generic property that two functions should have for the union of those two functions to result in a function. This property is called compatibility or joinability of two functions. Definition 4-3 formally specifies this particular property.

imagesDefinition 4-3: Compatible (Joinable) Functions “Function F is compatible with function G” ⇔ ( ∀c∈(dom(F)∩dom(G)): F(c)=G(c) )

Two functions are compatible if and only if for every first coordinate that the two functions have in common (that is, that can be chosen from the intersection of the domains of both functions), the corresponding second coordinates match. Only if two functions are compatible will the union of those two functions result in another function.

Note that Definition 4-3 considers two functions that have no first coordinate in common always to be compatible. As the following example illustrates, the union of two such functions is indeed a function:

G5 := { (deptno;1), (dname;'Natural Join'), (location;'Utrecht') }
G6 := { (invoice;'inv123'), (amount;3500) }
G5 ∪ G6 = { (deptno;1), (dname;'Natural Join'), (location;'Utrecht'),
             (invoice;'abc123'), (amount;3500) }

If two functions have no first coordinate in common, then the union of those two functions won’t hold two different ordered pairs with the same first coordinate. In fact, this is a property that is always true for two functions with no common first coordinate(s). However, in practice such unions will be meaningless.

Now let’s take a look at the intersection of two functions and reuse the example functions G1 through G6 for this:

G1 ∩ G2 = ⊘
G3 ∩ G4 = { (empno;999) }
G5 ∩ G6 = ⊘

The intersection of functions G1 and G2 is empty; there is no ordered pair that is an element of both G1 and G2. The intersection of functions G3 and G4 is precisely the singleton ordered pair for which the first coordinate appears in the intersection of dom(G3) and dom(G4) (coordinate empno), and the property G3(empno) = G4(empno) holds. Finally, the intersection of functions G5 and G6 is empty; although these two functions are compatible, there is no first coordinate that they have in common, let alone an ordered pair.

The result of an intersection of two functions will always be another function. A simple way to understand this is to realize that the intersection of any two given sets will always be a subset of both given sets; let A and B be two sets, then (A ∩ B) ⊆ A and (A ∩ B) ⊆ B. The intersection of two functions will be a subset of both two functions, and therefore a function (recall the section “Subset of a Function,” with Listing 4-4).

The difference of two functions will always be a function. This is due to the same reasoning as with the intersection of two functions. The difference of set A and set BA minus B—will always be a subset of set A.

Limitation of a Function

In Part 2 of this book, we’ll often be interested in a subset of a given function. Instead of all ordered pairs, we’ll only want to consider those ordered pairs whose first coordinate can be chosen from some given subset of the domain of the function. This leads us to the concept of a limitation of a function. Definition 4-4 defines the concept of a limitation of a function.

imagesDefinition 4-4: Limitation of a Function Let F be a function and A a set. The limitation of function F on set A; notation F↓A is defined as follows:

F↓A := { p | p∈F ∧ π1(p)∈A }

The limitation of a function F on set A is the subset of F that only holds those ordered pairs for which the first coordinate is an element of set A. Let’s illustrate this with a few examples in Listing 4-5. The aforementioned function G5 is reused in this listing.

Listing 4-5. Example Limitations of Functions

{ (X;2), (Y;1) }↓{X} = { (X;2) }
{ (X;2), (Z;2) }↓{Z} = { (Z;2) }
{ (X;2), (Z;2), (Y;1) }↓{X,Y,Z} = { (X;2), (Z;2), (Y;1) }
G5↓{dname,location} = { (dname;'Natural Join'), (location;'Utrecht') }
G5↓⊘=⊘

Note that a limitation of a function always results in another function; if you limit on the empty set the result is the empty function.

With the availability of this limitation operator, you’re able to give an alternative definition for compatibility of two functions; that is, the property two functions should have in order for the union of those two functions to result in another function. See Definition 4-5 for this alternative definition.

imagesDefinition 4-5: Alternative Definition for Compatibility (Joinability) of Functions Using Limitation “Function F is compatible with function G⇔ F↓(dom(F)∩dom(G)) = G↓(dom(F)∩dom(G))

Two functions are compatible (joinable) if, when limited to the intersection of their domains, these resulting limitations are equal.

Set Functions

Another concept that we’ll need in Part 2 of this book is the concept of a set function. Set functions will be used as the foundation for the definition of a database universe: a formal specification of a database design that will be introduced in Part 2.

As you know by now, a function is a set of ordered pairs, and an ordered pair is a pair of two elements for which the order is significant.

imagesNote No limitations have been put on the type of an element that is allowed to appear as a coordinate of an ordered pair. They can be of arbitrary complexity. For instance, coordinates are allowed to be sets.

A set function is a function in which every ordered pair holds a set as its second coordinate. Definition 4-6 formally defines this concept.

imagesDefinition 4-6: Set Function “F is a set function” ⇔ F is a function and (∀c∈dom(F): F(c) is a set)

As you can read from this definition, a set function is a function in which every range element is a set (possibly the empty set). Listing 4-6 provides various examples of set functions.

Listing 4-6. Example Set Functions

H1 := { (X;{1,2,3,4,5}), (Y;{1,2,3,4}) }
H2 := { (X;{1,2}), (Z;{3,4}) }
H3 := { (company;{'Natural Join','Centraal Boekhuis','Oracle','Remmen & De Brock'})
      , (location;{'Utrecht','Culemborg','De Meern','Eindhoven'}) }
H4 := { (empno;[1..99])
      , (ename;varchar(10))
      , (born;date)
      , (job;{'CLERK','SALESMAN','TRAINER','MANAGER','PRESIDENT'})
      , (salary;[1000..4999]) }

The first example H1 is a function (no first coordinate appears more than once). It is a set function because both H1(X) and H1(Y) evaluate to set values; {1,2,3,4,5} and {1,2,3,4}, respectively. The second example, function H2, is also a set function because H2(X) = {1,2} and H2(Z) = {3,4}.

Characterizations

You can consider set function H3 to enumerate, through the first coordinates of its ordered pairs, two aspects of a company: the name of the company (coordinate company) and the location of the company (coordinate location). Such first coordinates are referred to as attributes. Attached to these attributes are the value sets—as second coordinates of the ordered pairs— that could be considered to represent the set of possible values for these attributes of a company. Equally, you can consider set function H4 to characterize an employee. Note that the definition of H4 uses some of the set specification shorthand that was introduced in Table 2-3.

A set function is called a characterization when you use it to describe something in the real world by listing the relevant attributes in combination with the set of admissible values for each attribute. The relevant attributes are the first coordinates of the ordered pairs, and the sets of admissible values are their respective second coordinates.

We’ll refer to the first coordinates of the ordered pairs of a characterization as the attributes of the characterization, and we’ll refer to the second coordinates of the ordered pairs of a characterization as the attribute value sets for the attributes of the characterization.

As mentioned earlier, set function H4 can be considered a characterization representing the attributes of an employee. In this case, you’d only be interested in representing each employee’s number (attribute empno), name (attribute ename), birth date (attribute born), occupation (attribute job), and salary (attribute salary).

External Predicates

There’s another way of looking at characterizations. They can be considered to characterize a predicate in the real world; the characterization introduces the (names of the) parameters of the predicate and the corresponding value sets from which these parameters can take their values. For instance, characterization H4 characterizes the following predicate:

Employee ENAME is assigned employee number EMPNO, is born at date BORN, holds position JOB, and has a monthly salary of SALARY dollars.

This predicate has five parameters (ENAME, EMPNO, BORN, JOB, SALARY) that correspond to the five first coordinates in H4. The predicate is a natural language sentence describing the meaning and correlation of the involved attributes. In Part 2 of this book, a characterization will be at the basis of a table design. The predicate characterized by the characterization of a table design represents the user-understood meaning of that table design. Such a predicate is referred to as the external predicate of that table.

The Generalized Product of a Set Function

Having introduced set functions, we can now establish the concept of a generalized product of a set function (at first sight, frequently considered a complex concept). The generalized product takes a set function as its operand and produces a new set of functions.

The generalized product operator is a second key stepping stone, next to the powerset operator that was introduced in Chapter 2. Using these two set operators, we’ll be able to construct value sets that are usually fairly large, sets of tuples (introduced hereafter), sets of tables (introduced in Chapter 5), and a set of database states (introduced in Chapter 5). In Chapter 7, we’ll demonstrate a method that involves defining these value sets as building blocks for a formal specification of a database design.

The generalized product of a set function F, with the notation Π(F), generates a set with all possible functions that have the same domain as F, and for which the ordered pairs (in these functions) have a second coordinate that is chosen from (that is, is an element of) the relevant set that was attached to the same first coordinate in F. Definition 4-7 formally defines the generalized product operator.

imagesDefinition 4-7: Generalized Product of a Set Function Let F be a set function. The generalized product of F, notation Π(F), is defined as follows:

Π(F) = { f | f is a function ∧ dom(f) = dom(F) ∧ (∀c∈dom(f): f(c) ∈ F(c)) }

The last conjunct inside this definition specifies that every function generated by this operator has second coordinates that are elements of the relevant set (second coordinate) in the set function. Listing 4-7 gives a few examples that will quickly illustrate the effect this operator has when you apply it to a set function. Set functions H2 and H3 from the previous section are reused in this listing.

Listing 4-7. Example Applications of the Generalized Product

Π({ (a;{1,2,3}), (b;{4,5}) }) =
        { {(a;1), (b;4)}
        , {(a;1), (b;5)}
        , {(a;2), (b;4)}
        , {(a;2), (b;5)}
        , {(a;3), (b;4)}
        , {(a;3), (b;5)} }
Π(H2) = { {(X;1), (Z;3)}
        , {(X;1), (Z;4)}
        , {(X;2), (Z;3)}
        , {(X;2), (Z;4)} }
Π(H3) = { {(company;'Natural Join'), (location;'Utrecht')}
        , {(company;'Natural Join'), (location;'Culemborg')}
        , {(company;'Natural Join'), (location;'De Meern')}
        , {(company;'Natural Join'), (location;'Eindhoven')}
        , {(company;'Centraal Boekhuis'), (location;'Utrecht')}
        , {(company;'Centraal Boekhuis'), (location;'Culemborg')}
        , {(company;'Centraal Boekhuis'), (location;'De Meern')}
        , {(company;'Centraal Boekhuis'), (location;'Eindhoven')}
        , {(company;'Remmen & De Brock'), (location;'Utrecht')}
        , {(company;'Remmen & De Brock'), (location;'Culemborg')}
        , {(company;'Remmen & De Brock'), (location;'De Meern')}
        , {(company;'Remmen & De Brock'), (location;'Eindhoven')}
        , {(company;'Oracle'), (location;'Utrecht')}
        , {(company;'Oracle'), (location;'Culemborg')}
        , {(company;'Oracle'), (location;'De Meern')}
        , {(company;'Oracle'), (location;'Eindhoven')} }

The first example applies the generalized product to set function {(a;{1,2,3}), (b;{4,5})}. Let’s call this set function S1. The domain of S1 is {a,b}. The set of functions that results from the application of the generalized product on S1 holds six elements, each of which is a function. The first one enumerated is function {(a;1), (b;4)}. Let’s call this function f1. Function f1 has the same domain as the set function S1:

dom(S1) = dom(f1) = {a,b}

Note also that the following propositions hold: f1(a)∈S1(a) and f1(b)∈S1(b). The first proposition evaluates to 1∈{1,2,3}, which is true, and the second evaluates to 4∈{4,5}, which is also true. The conjunction of these two propositions corresponds to the universal quantification in Definition 4-7. In the same way, you can verify that the other five functions listed in the first example comply with Definition 4-7.

The second example holds four functions in the resulting set. Again, you can see that the domain of each of these functions equals {X,Z}, which is equivalent to dom(H2).

In the third example, the generalized product applied to characterization H3 results in a set of 16 functions. Every function can be considered to represent an actual possible company based on the example characterization for companies, H3.

We’ll refer to the functions in the result set of the generalized product of a set function as tuples. Note that such set functions will typically signify a characterization. The elements of a tuple—the ordered pairs—are called the attribute-value pairs of the tuple. In a relational database design, a tuple represents a proposition in the real world (see the sidebar “Closed World Assumption”).

Do you see how the cardinality of the resulting sets in Listing 4-7 can be computed?

#Π({ (a;{1,2,3}), (b;{4,5}) }) = #{1,2,3} * #{4,5} = 3 * 2 = 6
#Π(H2) = #H2(X) * #H2(Z) = #{1,2} * #{3,4} = 2 * 2 = 4
#Π(H3) = #H3(company) * #H3(location) = 4 * 4 = 16

The cardinality of the generalized product of a set function is equal to the product of the cardinalities of all range elements of the set function.

CLOSED WORLD ASSUMPTION

A Preview of Constraint Specification

Recall function H4 defined in Listing 4-6:

H4 = { (empno;[1..99])
     , (ename;varchar(10))
     , (born;date)
     , (job; {'CLERK','SALESMAN','TRAINER','MANAGER','PRESIDENT'})
     , (salary;[1000..4999]) }

You can consider Π(H4) as the set that has every possible representation of an employee—or instantiation of the external predicate, if you will—based on the attributes and corresponding attribute value sets given by characterization H4. This set holds a lot of tuples, yet it still has a finite cardinality because every range element of set function H4 is a finite set.

imagesNote For simplicity, we assume that only characters a through z (both lowercase and uppercase) are used for the varchar(n) set shorthand, which gives us 52 characters. The cardinality of date follows from its definition that was given in Table 2-3.

Following are the cardinalities of these range elements:

#[1..99] = 99
#varchar(10) = 5210 + 529 + 528 + ... + 521 = 7516865509350965246
#date = 3442447
#{'CLERK','SALESMAN','TRAINER','MANAGER','PRESIDENT'} = 5
#[1000..4999] = 4000

In Part 2 we’ll develop a variety of value sets that can be considered the type of certain variables. For tuple variables, we won’t be interested in the set that holds every possible tuple—based on H4—but rather in the set that holds every admissible tuple (admissible given certain rules in the real world); this is a certain subset of Π(H4). For instance, the following rules may be applicable to employees:

  • A CLERK has a salary of less than 2500.
  • A PRESIDENT has a salary of more than 4000.

You can define the set of admissible tuples by first creating the set of all possible tuples and then narrowing down this set by specifying the rules (as predicates) to which every tuple should conform. All ingredients for doing so have now been presented. Take a look at the specification of the following set (named t-emp):

t-emp := { t | t∈Π(H4) ∧ ( t(job)='CLERK' ⇒ t(salary)<2500 )
                       ∧ ( t(job)='PRESIDENT' ⇒ t(salary)>4000 ) }

In this example, we start combining the logic introduced in Chapters 1 and 3 with the set theory introduced in Chapter 2 and this chapter. Set t-emp is a subset of Π(H4); only those (possible) tuples that satisfy the two additional predicates given inside the set specification are in set t-emp. These predicates are the formal representation of the two informally described rules.

Function Composition

One final concept, function composition, needs to be introduced before you can finally move on to Part 2 of this book and see the full application of all mathematical concepts introduced in the first four chapters. You can use function composition to rename attributes of tuples; this is something you’ll occasionally need to apply when using the join (to be introduced in Part 2), union, intersection, or difference operators.

Function composition is about applying two functions, serially, one after the other. Given some domain element, you first apply one function that results in a range element. You then apply the second function on this resulting range element. You can find a formal definition for function composition in Definition 4-8.

imagesDefinition 4-8: Function Composition Let A, B, and C be sets. Let f be a function over A into B. Let g be a function over rng(f) into C.

The composition of functions f and g, notation g◊f (pronounced as “g of f” or “g after f”), is defined as follows: g◊f := { (a; g(f(a)) | a∈dom(f) ∧ f(a)∈dom(g) }

As you can see from the definition, the composition of two functions results in another set of ordered pairs. In fact, it results in another function because the domain of g◊f is equal to dom(f), and only one pair per domain element of function f is introduced in the composition g after f.

Let’s illustrate function composition with an example. Listing 4-8 introduces two functions, f and g, and shows the result of applying g after f (derived in three steps).

Listing 4-8. Example Functions f and g, and Their Function Composition g After f

f := { (num;empno), (name;ename), (occup;job) }
g := { (empno;999), (ename;'Smith'), (job;'Trainer') }
g◊f = { (num;g(f(num))), (name,g(f(name))), (occup;g(f(occup))) }
    = { (num;g(empno)), (name;g(ename)), (occup;g(job))
    = { (num;999), (name;'Smith'), (occup;'Trainer') }

This example specifically shows you how function composition can be employed to rename attributes of a given tuple (in this case function g). In this example, f is the renaming function and g is the tuple for which the attributes get renamed according to f.

The application of the first function (f) performs the renaming of the first coordinates (attribute empno is renamed to num, ename is renamed to name, and job is renamed to occup), and the application of the second function (g) reattaches the second coordinates to the renamed attributes.

imagesNote If the renaming function holds fewer attributes than the tuple whose attributes are being renamed, then in effect, you are at the same time performing a function limitation too. For instance, if f (from Listing 4-8) would be equal to {(num;empno)}, then g◊f results in {(num;999)}.

You can also draw a picture of function composition. Figure 4-3 shows a picture of the function composition of functions f and g, introduced in Listing 4-8.

images

Figure 4-3. Function composition g◊f

You can immediately see from this picture that coordinate num maps to range element 999, coordinate name maps to 'Smith', and coordinate occup maps to 'Trainer'.

Chapter Summary

This section provides a summary of this chapter, formatted as a bulleted list. You can use it to check your understanding of the various concepts introduced in this chapter before continuing with the exercises in the next section.

  • A binary relation from set A to set B is defined as a subset of the Cartesian product A×B. Formally, “R is a binary relation from set A to set B” ⇔ R∈℘(A×B).
  • A function is a binary relation where no first coordinate appears more than once. Formally, “F is a function over A into B” ⇔ F∈℘(A×B) ∧ (∀p1,p2∈F: p1≠p2 ⇒ π1(p1)≠ π1(p2) ) ∧ { π1(p) | p∈F } = A.
  • The set of all first coordinates of a function F is referred to as the domain of function F. The notation is dom(F). Therefore, if F is a function over A into B, then dom(F) = A.
  • The set of all second coordinates of a function F is referred to as the range of function F. The notation is rng(F). Therefore, if F is a function over A into B, then rng(F) ⊆ B.
  • If F is a function and x is an element of dom(F), then F(x) (the application of F to x) denotes the range element that is paired to x in function F.
  • Every subset of a function is a function.
  • Two functions are compatible (or joinable) if and only if for every first coordinate that the two functions have in common, the corresponding second coordinates match. Only if two functions are compatible will the union of those two functions be a function.
  • The limitation of a function F to set A is the subset of F that only holds those ordered pairs for which the first coordinate is an element of set A. The notation is F↓A.
  • A set function is a function in which every element (an ordered pair) holds a set as its second coordinate.
  • If a set function is used to represent the aspects of something in the real world that we want to capture, then we refer to such a set function as a characterization. The domain elements of a characterization are called the attributes of the characterization. The range elements are called the attribute value sets of these attributes.
  • The generalized product of a set function F, with the notation Π(F), generates a set with all possible functions that have the same domain as F, and for which the ordered pairs have a second coordinate that is chosen from the corresponding set attached to the same first coordinate in F.
  • The elements in a generalized product of a characterization can be considered to represent propositions in the real world and are referred to as tuples.
  • The first coordinates of the ordered pairs in a tuple are referred to as the attributes of that tuple.
  • The function composition of two given functions f and g, notation gf (g after f), can be used to rename the attributes of a tuple. In this case, f is the renaming function and g is the tuple for which the attributes get renamed according to f.

Exercises

  1. How many distinct binary relations can be constructed from a given set A with cardinality 3 to a given set B with cardinality 4?
  2. How many distinct functions can be constructed from a given set A with cardinality 3 to a given set B with cardinality 4?
  3. Suppose the following sets are given:
    A := {a, b, c}
    B := {c, d, e, f}

    Which of the following sets are functions from A to B?

    1. { (a;b), (b;d) }
    2. { (c;c), (b;f), (a;f) }
    3. { (a;c), (c;f), (a;c), (b;e) }
    4. { }
    5. { (x;y) | x∈{a,b} ∧ y∈{e,f} ∧ (x=a ⇒ y=f) }
    6. { (c;b), (d;f) }
  4. What is the domain of the following functions?
    1. { (X;1), (Y;1) }
    2. { (X;1), (Z;3), (X;1) }
    3. { (empno;100), (ename;'Codd'), (occupation;'mathematician') }
  5. What is the range of the following functions?
    1. { (X;1), (Y;3) }
    2. { (X;1), (Y;{1}) }
    3. { (X;⊘), (Z;3), (Y;(a;b)) }
    4. { (deptno;50), (dname;'Research'), (location;'London') }
  6. Which of the following expressions represent functions?
    1. { (X;1), (Y;3) } ∪ { (X;1), (Z;2), (R;1) }
    2. { (X;1), (Y;3) } ∩ { (X;1), (Y;2), (X;2) }
    3. { (X;1), (Y;3) } ∪ { (X;1), (X;2) }
    4. { (X;1), (Y;2) } ∩ { (X;1), (Y;2), (X;2) }↓{Y}
    5. {p | p∈{X,Y}×{1,2} ∧ (π1(p)=Y ⇒ π2(p)=2) }↓{Y}
  7. Give the result sets for the following expressions using the enumerative method:
    1. Π{ (a;{1}), (b;{1,2}) }
    2. Π{ (a;{1,2,3,4}), (b;{5,6,7,8}), (d;⊘), (e;{9,10}) }
    3. {t | t∈Π{(a;{1,2,3,4}), (b;{5,6,7,8})}∧t(a)+t(b)<8}
    4. {f | f∈Π{(X;{1,2}), (Y;{1,2,3})}∧#rng(f)=1 }
  8. Given the following function
    f := { (ean;9786012), (descr;'A very nice book')
         , (price;24.99), (category;A) }

    Provide the result sets for these expressions:

    1. f↓{ean,price}
    2. f↓dom(f)
    3. f↓{descr}
    4. f↓{description}
    5. rng(f↓{ean,price,category})
  9. Given the following characterization
    F := { (ean;num(7,0))
         , (descr;varchar(10))
         , (price;[1..99])
         , (stock;[0..10])
         , (category;{A,B,C}) }

    Which of the following tuples is an element of set Π(F)?

    1. { (ean;123), (stock;9), (category;C), (descr;'Nice article'), (price;10) }
    2. { (price;1), (category;A), (stock;11), (descr;'Fine book') (ean;987403) }
    3. { (ean;4572), (descr;'Book 4572'), (price;79), (stock;4), (category;A) }
    4. { (ean;4572), (descr;'Book 4572'), (stock;6), (price;0) }

      Give the result for the following expressions:

    5. #F
    6. F↓{price,stock}
    7. F(stock)×F(category)
  10. Given the following two functions
    f := { (empno;103), (ename;'Tedesco'), (deptno;11) }
    g := { (eno;103), (ename;'Tedesco'), (dno;11) }

    Specify function h that renames the empno attribute in f to eno and the deptno attribute to dno. Formally, specify function h such that (f◊h) = g.

  11. Given the following sets
    E := { { (empno;103), (ename;'Tedesco'), (mgrno;104), (deptno;10) }
          ,{ (empno;104), (ename;'Rhodes'), (mgrno;106), (deptno;10) }
          ,{ (empno;105), (ename;'Sparks'), (mgrno;106), (deptno;20) }
          ,{ (empno;106), (ename;'Weiss'), (mgrno;102), (deptno;10) } }
    D := { { (deptno;10), (dname;'Gaming') }
          ,{ (deptno;20), (dname;'Sales') }
          ,{ (deptno;30), (dname;'Research') } }
    f := { (empno;mgrno) }

    Investigate the truth value of these two propositions:

    1. { e◊f | e ∈ E} = { e↓{empno} | e ∈ E}
    2. {e↓{deptno} | e ∈ E} ⊂ {d↓{deptno} | d ∈ D}

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.105.255