From this chapter onward we’ll assume that you’re thoroughly familiar with all terminology introduced in Part 1 of this book. Only if we deem it necessary will we make explicit references back to where specific terminology was first introduced and defined.
We’ll start with an introduction of various interrelated terms that are used in this chapter and the other chapters of Part 2 that follow.
In Part 1 we hinted that a table can be considered a set of functions. In this chapter we’ll demonstrate this in detail. The section “Formal Specification of a Table” shows how you can formally represent a table. It also introduces a shorthand notation for a table and defines what’s called the heading of a table. We’ll familiarize you with a method to formally specify—in a predicative way—a table given a characterization. You already saw a preview of this in Chapter 4.
The section “Database States” deals with formally specifying a database state. The database state that is the current value of the database variable holds all tables of a database. This section also defines the concept of a database skeleton.
The section “Operations on Tables” discusses various common table operators: union, intersection, difference, projection, restriction, join, attribute renaming, extension, and aggregation. These operators take tables as their operands and produce another table.
You’ll find a “Chapter Summary” at the end, followed by an “Exercises” section.
This section introduces various interrelated terms that we’ll use in the chapters of Part 2. Because some of these terms might have a meaning to you that differs from the one we intend to use in this book, or because some of these terms are new to you, we decided to introduce them here in advance. This introduction is informal. Two of these terms (database design and table design) were already used in Chapter 4. Some of the terms are formal terms; we’ll specify these precisely in this part of the book.
NoteYou might find the informal introduction of some of these terms somewhat vague. If so, then don’t worry about it now. Just start reading the chapters of Part 2 and revisit this informal introduction of the terms later.
Database design is the informal term for the blueprint of a set of data structures that are used to store business data, including the integrity requirements of this data. In this book, these blueprints will always be relational designs; they will conform to the relational model of data. Among other things, this implies that the aforementioned data structures are table data structures; that is, the data is stored via tables (using the meaning of a table as introduced hereafter).
We’ll often use a business term to indicate what particular database design we’re talking about; for instance, the HR (human resources) database design, or the education department database design.
Note The other meaning often attributed to database design is the process of developing such a blueprint. This process includes activities such as requirements analysis, conceptual database design, logical database design, and physical database design. Using this other meaning, you can say that this book refers to a database design as the end result of the logical database design (process). In Part 3 (Chapter 11) we’ll look into the physical database design using an SQL DBMS.
Database variable is an implementation term. It is a variable (that is, a holder for a value) that is capable of storing actual values for the data structures described by a database design. In Chapter 7 we’ll provide a mathematical treatment of the type of—or value set for—a database variable. This value set is referred to as the database universe. At any given point in time and space, a database variable will hold an element of this value set as its current value. The elements of a database universe are referred to as database states. Note that a database consists of exactly one database variable. In a sense, that variable is the database.
Like every variable, the database variable will have a name, allowing us to talk about it without knowing which database state it currently holds.
Note An SQL DBMS does not supply the concept of a database variable. Instead, SQL DBMS vendors have implemented the concept of table variables. A table variable can hold a table as its value; in an SQL DBMS you create one through the create table
statement. In these DBMSes, a database design is implemented by creating as many table variables as the database design has table structures, and defining their respective types (instead of defining just one database variable and defining its type).
Database universe is a formal term. It is the mathematical construct used to formally specify the table data structures and integrity requirements of a database design. A database universe is the main part of the formal specification of a database design; it precisely describes the data values (tables) that we allow to be stored by the table data structures. Chapter 7 will establish the formal definition of a database universe; for now just think of it as a set of all values that we allow for a given database variable to hold over the course of time.
We’ll often use a shorthand name for a given database universe (just symbols) for convenience of reference in the text.
Database state is a formal term. It is a mathematical construct used to formally specify a value that can be held by a database variable. This chapter will establish the formal definition of a database state. We’ll often use a shorthand name for a given database state for convenience of reference in the text.
Database is an informal term for the actual (business) data that is stored in a computer in a structured way (that is, conforming to the database design). Abstractly, a database is the current value of the database variable; it is the informal term for database state.
A DBMS is a computer program used to manage, query, and update the value of a database variable.
Table design is the informal term for the blueprint of one of the table data structures of a database design. A relational database design typically consists of several table designs and a set of constraints (see Chapter 7) addressing the integrity requirements for the database. It is often convenient to talk about a single table design, and for this reason we’ll use a business term to indicate what particular table design we’re talking about; for instance, the employee table design, or the department table design.
Table structure, short for table data structure, is an implementation term. It is used to refer to the data structure that implements a specific table design. As mentioned before in the implementation of a database design, you are accustomed to using SQL table variables to implement the separate table data structures. You could regard a table structure as a table variable. However, we deliberately do not use the term table variable in this book; the methodology developed in this book only knows the concept of a database variable. A table structure can be considered a specific subsection of the database variable: the subsection that holds the current table for only one of the involved table designs.
In Chapter 10 we’ll discuss a formal way of specifying updates (value changes) of a database variable; we consider updating just one of the table structures of a database as an update of the database variable. However, when explaining our formal methodology we’ll often conveniently use the term table structure as if it were a separately addressable (table) variable, and we’ll even give such a table data structure a name—as we would give a name to a table variable—enabling us to talk about it without knowing which value is currently held by the data structure.
Table is a formal term. It is a mathematical construct used to formally specify a value that can be held by a table structure that implements a specific table design (or if you will, a value that can be held by a table variable). We’ll formally establish this term in the next section. We’ll often use a shorthand name for a given table for convenience of reference in the text.
In Chapter 4 you saw that you can use a tuple (a set of ordered pairs) to describe a true proposition in the real world. For instance, the tuple { (empno;100), (ename;'Smith'), (job;'MANAGER') }
might represent this proposition: “Employee with number 100
has name Smith
and is employed as a MANAGER
.” In this section we’ll build on this use of tuples and introduce you to a formal way of specifying a table.
A table represents many (zero or more) true propositions of the same kind. If you can use a tuple to represent a single proposition, then evidently a table can be regarded as a set of tuples. Take a look at the set of tuples, which we have named T1
, in Listing 5-1.
T1 :=
{ { (partno;1), (name;'hammer'), (instock;22), (price;10) }
,{ (partno;2), (name;'screwdriver'), (instock;19), (price;5) }
,{ (partno;3), (name;'axe'), (instock;0), (price;30) }
,{ (partno;4), (name;'saw'), (instock;4), (price;15) }
,{ (partno;5), (name;'wrench'), (instock;7), (price;20) }
,{ (partno;6), (name;'scissors'), (instock;32), (price;5) } }
Set T1
contains six tuples; in other words, six functions (recall that a tuple can be regarded as a function). All six functions have the same domain: {partno,name,instock,price}
. We’ll refer to a set of functions that all share the same domain as a table.
The tuples in T1
represent six propositions. They can be regarded as instantiations of the following external predicate, for instance: “The part with part number partno
is named name
, instock
items of it are currently held in stock, and its price is price
.”
Note The English word “table” here is being used in a different sense than its normal meaning (twodimensional picture of columns and rows). In this book we’ll use table to denote a set of functions. However, you’ll see that a table is usually represented in columns and rows.
Definition 5-1 describes the concept of a table in a formal way. It uses the concept of “a function over a given set” (see Definition 4-2).
Definition 5-1: Table If T
and H
are sets, then “T
is a table over H
” ⇔( ∀t∈T: t is a function over H )
.
This definition is generic in the sense that no restrictions are imposed upon the elements of set H
. However, in practice we’ll only be interested in those cases where H
is a set of names representing attributes.
Table T1
can be considered a parts table. It is a table over {partno,name,instock,price}
, consisting of six tuples, each representing information about a different part. It holds for each such part the part number, its name, how many items of the part are in stock, and the price of the part.
Here are a few more examples:
T2 := { { (X;2), (Y;1) }, { (Y;8), (X;0) }, { (Y;10), (X;5) } }
T3 := { { (partno;3), (name;'hammer') }, { (pno;4), (pname;'nail') } }
T4 := { { (empno;105), (ename;'Mrs. Sparks'), (born;'03-apr-1970') }
,{ (empno;202), (ename;'Mr. Tedesco') } }
T2
is indeed a table. It holds three functions, all of which share the domain {X,Y}
. It is a table over {X,Y}
. Note that the order of the pairs (inside the functions) doesn’t matter. T3
is not a table. It is a set of functions; however, the domain of the first function is {partno,name}
, which differs from the domain of the second function: {pno,pname}
. Likewise, T4
is also not a table. It is a set of functions; however, the domain of the first function is {empno, ename, born}
, which differs from the domain of the second function: {empno,ename}
.
An element of a table is a function, and each such function is referred to specifically as a tuple. In Chapter 4 you were introduced to this term when we introduced the generalized product of a characterization (see the section “The Generalized Product of a Set Function”). The generalized product of a characterization is in fact a table; it holds functions, all of which share the same domain.
A table is a set, and the elements of this set are tuples. By the definition of a set, this implies that every tuple is unique within that set; no tuple can appear more than once in the same table.
If T
is a table over H
, then every proper, non-empty subset of T
(containing fewer tuples than T
) is of course also a table over H
.
Even the empty set (∅
) is a table. In fact, under Definition 5-1 (which quantifies over the elements in the table), the empty set is a table over any set. You might want to revisit the rewrite rules in Table 3-2 for this.
Note The empty set is often used as the initial state for a given table structure.
Writing down a table using the formal enumerative method as introduced in Listing 5-1 is quite elaborate. For every tuple that’s an element of the table, you’re essentially repeating the (shared) domain as the first coordinates of the ordered pairs. To avoid this repetition, it’s common to draw a picture of a table. In this picture you list the names of the attributes of the tuples only once (as column headers) and under those you then list the attribute values for every tuple (one per row). Figure 5-1 shows this shorthand notation of a table. It does so for table T1
introduced in Listing 5-1.
As you can see, this looks like a table in the common language sense (a two-dimensional picture of columns and rows). But remember, a table—in this book—is a set of functions. This has two important implications. First, this means that the order in which functions are enumerated is arbitrary; it does not matter. And second, the order in which the ordered pairs are enumerated (within a tuple) does not matter either. Therefore, in the preceding shorthand notation, the order of the column headers (left to right) and the order of the rows (top to bottom) don’t matter.
Note In the shorthand notation, an ordering to the attributes has been introduced because the order of attribute values in each tuple now has to correspond to the ordering of the column headings.
The shorthand notation demonstrates some other terminology that is often used when dealing with tables. As you can see, table T1
is a table over {partno,name,instock,price}
. This shared domain of the tuples in T1
is referred to as the heading of table T1
(see Definition 5-2).
Definition 5-2: Heading of a Table If “T
is a table over H
” then H
is referred to as the heading of T
.
We’ll often use the shorthand notation in the remainder of this book to illustrate a particular table. However, you should never forget that it is only a shorthand notation. In this book, a table is formally defined as a set of functions, all of which share the same domain.
This formal definition of a table enables you to deal with operations on tables (later in this chapter) and data integrity predicates (discussed in the next chapter) in a clear and formal way too.
You can construct a table from a given set function by applying the generalized product to it. Let’s demonstrate this with an example. Listing 5-2 displays a set function called F1
.
F1 := { (X; {0,1,2})
,(Y; {0,1,2})
,(Z; {-1,0,1}) }
The generalized product of set function F1
will result in a set of twenty-seven functions (three times three times three). Listing 5-3 shows this result.
Π(F1) :=
{ { (X;0), (Y;0), (Z;-1) }, { (X;0), (Y;0), (Z;0) }, { (X;0), (Y;0), (Z;1) }
, { (X;0), (Y;1), (Z;-1) }, { (X;0), (Y;1), (Z;0) }, { (X;0), (Y;1), (Z;1) }
, { (X;0), (Y;2), (Z;-1) }, { (X;0), (Y;2), (Z;0) }, { (X;0), (Y;2), (Z;1) }
, { (X;1), (Y;0), (Z;-1) }, { (X;1), (Y;0), (Z;0) }, { (X;1), (Y;0), (Z;1) }
, { (X;1), (Y;1), (Z;-1) }, { (X;1), (Y;1), (Z;0) }, { (X;1), (Y;1), (Z;1) }
, { (X;1), (Y;2), (Z;-1) }, { (X;1), (Y;2), (Z;0) }, { (X;1), (Y;2), (Z;1) }
, { (X;2), (Y;0), (Z;-1) }, { (X;2), (Y;0), (Z;0) }, { (X;2), (Y;0), (Z;1) }
, { (X;2), (Y;1), (Z;-1) }, { (X;2), (Y;1), (Z;0) }, { (X;2), (Y;1), (Z;1) }
, { (X;2), (Y;2), (Z;-1) }, { (X;2), (Y;2), (Z;0) }, { (X;2), (Y;2), (Z;1) } }
In this result set, every function has the same domain {X,Y,Z}
; the result set is therefore a table over {X,Y,Z}
. Now take a look at the following, more realistic, example. Listing 5-4 shows a characterization of a part. For a part, the attributes of interest are the number of the part (partno
), its name (name
), the quantity in stock of this part (instock
), and the part’s price (price
).
chr_PART := { (partno; [1..999])
,(name; varchar(12))
,(instock; [0..99])
,(price; [1..500]) }
The generalized product of chr_PART
is a rather large table; it contains all possible tuples that can be generated using the given attribute-value sets that are introduced in the definition of chr_PART
. Note that every tuple inside table T1
—introduced at the beginning of this chapter— is an element of Π(chr_PART)
. This makes T1
a subset of Π(chr_PART)
.
Note Every subset of Π(chr_PART)
—not just T1
—is a table over {partno,name,instock,price}
.
In fact, every tuple in table T1
is an element of the following set (named T2
) that is based on Π(chr_PART)
:
T2 := { t | t∈Π(chr_PART) ∧ ( t(price) ≥ 20 ⇒ t(instock) ≤ 10 )
∧ ( t(price) ≤ 5 ⇒ t(instock) ≥ 15 )
}
Inside the definition of T2
, two predicates are introduced that condition the contents of T2
. The first condition states that if the price of a part is 20
or more, then the quantity in stock for this part should be 10
or less. The second condition states that if the price of a part is 5
or less, then the quantity in stock for this part should be 15
or more. T2
—a table over {partno,name,instock,price}
—will hold all and only those tuples of Π(chr_PART)
for which both these two conditions are true. Note that because Π(chr_PART)
will hold many tuples that violate one or both of these conditions, T2
is a proper subset of Π(chr_PART)
. Because all tuples of T1
conform to these two conditions, T1
is a subset of T2
. In fact, it is a proper subset of T2
.
In Chapter 7 we’ll revisit this way of defining T2
; that is, taking the generalized product of a characterization and “plugging in” additional predicates.
A database is a representation of the state of affairs of some organization. It consists of a table for every kind of proposition about this organization that we would like to record in the database. This section introduces you to a formal way of specifying a database via a database state.
Let’s consider a simple database design involving two table structures: one for employees and one for departments. Take a look at tables EMP1
and DEP1
, which are displayed in Figures 5-2 and 5-3.
Let’s assume that tables EMP1
and DEP1
represent the current state of the employee and department table structures, respectively. We can formally specify the database state consisting of tables EMP1
and DEP1
as a function. In this function, the first coordinates of the ordered pairs represent the table structure names and the second coordinates hold the (current) values—tables EMP1
and DEP1
—for these table structures. Listing 5-5 gives the formal specification of this example database state.
DBS1 :=
{ (EMPLOYEE;
{ {(empno;101),(ename;'Chris'), (job;'MANAGER'),(sal;7900),(deptno;10)}
,{(empno;102),(ename;'Kathy'), (job;'TRAINER'),(sal;6000),(deptno;12)}
,{(empno;103),(ename;'Thomas'),(job;'CLERK'), (sal;2100),(deptno;10)}
,{(empno;104),(ename;'David'), (job;'TRAINER'),(sal;5600),(deptno;10)}
,{(empno;105),(ename;'Renu'), (job;'CLERK'), (sal;3000),(deptno;12)}
,{(empno;106),(ename;'Bob'), (job;'MANAGER'),(sal;8500),(deptno;10)}
,{(empno;107),(ename;'Sue'), (job;'CLERK'), (sal;2700),(deptno;12)} }
)
,(DEPARTMENT;
{ {(deptno;10),(dname;'RESEARCH'),(loc;'DENVER'), (salbudget;50000)}
,{(deptno;11),(dname;'SALES'), (loc;'DENVER'), (salbudget;20000)}
,{(deptno;12),(dname;'SUPPORT'), (loc;'LOS ANGELES'), (salbudget;40000)}
,{(deptno;13),(dname;'SALES'), (loc;'SAN FRANCISCO'), (salbudget;20000)} }
)
}
Database state DBS1
is a function containing just two ordered pairs. The first coordinate of the first ordered pair listed is EMPLOYEE
(the name we chose for the employee table structure), and the corresponding second coordinate is table EMP1
. Likewise, in the second ordered pair you’ll notice that we chose DEPARTMENT
as the name for the department table structure. In this case, the second coordinate is table DEP1
.
Given this definition of function DBS1
, you can now—using function application—refer to expressions such as DBS1(EMPLOYEE)
, which denotes table EMP1
, and DBS1(DEPARTMENT)
, which denotes table DEP1
.
To specify a database state, you not only need actual tables (EMP1
and DEP1
in the preceding example), but you also need to decide upon names for the table structures (EMPLOYEE
and DEPARTMENT
in the preceding example).
You probably won’t be surprised by now that the formal specification of a database design (which we’ll demonstrate in Chapter 7) also holds the specification of a characterization for every table design involved. You choose the names of the attributes involved in a table design when you specify the characterization for that table design.
A database skeleton collects all these names—for the table structures and the involved attributes—into a single formal structure. A database skeleton is a set function with an ordered pair for every table structure. Every first coordinate introduces the table structure name, and the second coordinate introduces the set of names of the involved attributes.
Listing 5-6 displays the database skeleton for the employee/department database design introduced in the previous section.
SK1 :=
{ (EMPLOYEE; {empno,ename,job,sal,deptno} )
,(DEPARTMENT; {deptno,dname,loc,salbudget} ) }
As you can see, set function SK1
introduces the names EMPLOYEE
and DEPARTMENT
for the table structures involved in the employee/department database design, and it attaches the set of names of the relevant attributes to them.
Note You should carefully choose the names introduced by the database skeleton, because they not only constitute the vocabulary between you (the database professional) and your customer (the users), they are also the first stepping stone to understanding the meaning (semantics) of a database design. You’ll see in the following chapters that data integrity constraints form a further important stepping stone for the understanding of the semantics of the database design.
This section covers some important table operators. You will apply these operators when specifying queries and transactions (Chapters 9 and 10), or certain types of predicates (Chapters 6, 7, and 8).
We’ll first take a look at the well-known set operators union, intersection, and difference. Next we’ll investigate the projection and restriction of a table, followed by the join—an important operator in the database field—and closely related to the join, the attribute renaming operator. Finally, we’ll deal with extension and aggregation.
Because tables are sets, you can apply the well-known set operators, union, intersection, and difference, with tables as their operands. This section will explore the application of these set operators on tables.
We’ll use three example tables, named E1
, E2
, and E3
, in the following sections. Figure 5-4 displays these tables.
As you can see, E1
and E2
are both tables over {EMPNO,ENAME,JOB}
, and E3
is a table over {E#,NAME,JOB,SAL}
. All three tables represent information about employees. For E3
, some of the names of the attributes were chosen differently, and E3
holds additional information (the salary). Employee 102
occurs in both table E1
and table E2
(with the same attribute values). Employee 101
occurs in both table E1
and table E3
.
As you probably know, the union of two sets holds all objects that are either an element of the first set, or an element of the second set, or an element of both. Here is the union of tables E1
and E2
, denoted by E1∪E2
:
E1∪E2 =
{ {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER') }
,{(EMPNO;102), (ENAME;'Thomas'), (JOB;'SALESMAN') }
,{(EMPNO;103), (ENAME;'Lucas'), (JOB;'PRESIDENT')}
,{(EMPNO;104), (ENAME;'Pete'), (JOB;'MANAGER') } }
The union of E1
and E2
is a table over {EMPNO,ENAME,JOB}
. It holds four tuples, not five, because the tuple of employee 102
is a member of both sets.
Now take a look at the union of tables E1
and E3
(E1∪E3
):
E1∪E3 =
{ {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER') }
,{(EMPNO;102), (ENAME;'Thomas'), (JOB;'SALESMAN') }
,{(EMPNO;103), (ENAME;'Lucas'), (JOB;'PRESIDENT') }
,{(E#;101), (NAME;'Anne'), (JOB;'TRAINER'), (SAL;3000)}
,{(E#;102), (NAME;'John'), (JOB;'MANAGER'), (SAL;5000)} }
This result is a set of functions, but it clearly isn’t a table; not all the functions in this result set share the same domain.
Remember the closure property (in the section “Union, Intersection, and Difference” in Chapter 2)? We’re only interested in those cases where the union of two tables results in another table. The union operator is evidently not closed over tables in general. It’s only closed over tables if the operands are tables that have the same heading. If the operands of the union operator are non-empty tables over different headings, then the resulting set won’t be a table.
Note the special case where the empty table is involved as an operand. The union of a given table with the empty table (∅
) always results in the given table; because ∅
is a table over any heading, the closure property holds.
The intersection of two sets holds all objects that are an element of the first set and an element of the second set. Here’s the intersection of tables E1
and E2
(E1∩E2
):
E1∩E2 = { {(EMPNO;102), (ENAME;'Thomas'), (JOB;'SALESMAN')} }
The intersection of E1
and E2
is a table. It’s probably not difficult to see that the intersection of two tables with the same heading will always result in another table. Note that the intersection is also closed over tables when the operands are tables over different headings. However, the intersection is useless in these cases, because it then always results in the empty table; you might want to check this by investigating the intersection of E1
and E3
(tables with different headings).
You can meaningfully intersect tables E1
and (part of ) E3
, but first you’d have to transform one of these in such a way that it has the same heading as the other table. The concepts that enable you to do so have all been introduced in Chapter 4: function limitation and function composition.
Take a look at the following definition for table E4
. It renames attributes E#
and NAME
of table E3
.
E4 := { e◊{(EMPNO;E#),(ENAME;NAME),(JOB;JOB),(SAL;SAL)} | e∈E3 }
Set E4
is a table over {EMPNO,ENAME,JOB,SAL}
. We used function composition (◊
; see Definition 4-8) to rename two attributes. Attribute E#
is renamed to EMPNO
and attribute NAME
is renamed to ENAME
. The other two attributes (JOB
and SAL
) are left untouched.
Next we need to get rid of the extra SAL
attribute (which is also not part of the heading of E1
). For this we use function limitation (↓
, see Definition 4-4). Take a look at the definition of E5
:
E5 := { e↓{EMPNO,ENAME,JOB} | e∈E4 }
E5
equals the following set:
{ {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER')}
,{(EMPNO;102), (ENAME;'John'), (JOB;'MANAGER')} }
The intersection of E5
with E1
has now become meaningful, and results in the following set:
E5∩E1 = { {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER')} }
Last, we note the special case where the empty table is involved as an operand. The intersection of a given table with the empty table always results in the empty table.
The difference of two sets holds all objects that are an element of the first set and that are not an element of the second set. Here is the difference of tables E1
and E2
(E1–E2
):
E1–E2 =
{ {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER') }
,{(EMPNO;103), (ENAME;'Lucas'), (JOB;'PRESIDENT')} }
Again, as you can see, the result is a table over {EMPNO,ENAME,JOB}
. The difference of two tables with the same heading always produces another table. Like the intersection, the difference operator is also closed over tables when the operands are tables over different sets. However, here too, in these cases the difference is useless because it always results in the first table; the second table cannot have tuples that are in the first table (due to the different headings). Hence, tuples will never be “removed” from the first set.
Because the difference operator is not commutative, we note the two special cases when the empty set is involved as an operand (in contrast with the preceding intersection and union). Let T
be a given table; then T–∅
always results in a given table T
, and ∅–T
always results in the empty table.
Another important table operator that needs to be introduced is the projection of a table on a given set. The projection of a given table—say T
—on a given set—say B
—performs the limitation of every tuple in T
on set B
. We use symbol ⇓
to denote projection. The projection operator can be viewed as a version of the limitation operator that has been lifted to the table level. Definition 5-3 formally defines this operator.
Definition 5-3: Projection of a Table Let T
be a set of functions and B
a set. The projection of T
on B
, notation T⇓B
, is defined as follows:
T⇓B := { t↓B | t∈T }
The projection of T
on B
holds every function in T
limited to B
. Although the preceding definition describes the projection for each set of functions T
and each set B
, we are mainly (but not exclusively) interested in those cases in which T
is a table, and moreover in which B
is a (non-empty) proper subset of the heading of such a table.
Let’s take a look at an example to illustrate the concept of projection. Listing 5-7 introduces table T3
. It holds five tuples with domain {empno,ename,salary,sex,dno}
.
T3 := { {(empno;10), (ename;'Thomas'), (salary;2400), (sex;'male'), (dno;1)}
,{(empno;20), (ename;'Lucas'), (salary;3000), (sex;'male'), (dno;1)}
,{(empno;30), (ename;'Aidan'), (salary;3000), (sex;'male'), (dno;2)}
,{(empno;40), (ename;'Keeler'), (salary;2400), (sex;'male'), (dno;1)}
,{(empno;50), (ename;'Elizabeth'), (salary;5600), (sex;'female'), (dno;2)} }
Here’s the result of the projection of T3
on {salary,sex}
, denoted by T3⇓{salary,sex}
:
T3⇓{salary,sex} =
{ {(salary;2400), (sex;'male')}
, {(salary;3000), (sex;'male')}
, {(salary;5600), (sex;'female')} }
Note that the projection of T3
on {salary,sex}
results in another table: a table over {salary,sex}
. This table has only three elements, whereas T3
has five elements. The first and the fourth tuple enumerated in the definition of T3
result in the same (limited) function, as do the second and the third function enumerated in T3
. Therefore, only three tuples remain in the resulting set of this projection.
You can now specify E5
introduced in the section “Intersection” with E4⇓{EMPNO,ENAME,JOB}
.
Tables typically hold many tuples. Often you’re only interested in some of these tuples: tuples that have a certain property. You want to look at a subset of the tuples in the given table. You can derive a subset of a given table through table restriction. This operator does not require a new mathematical symbol; you can simply use the predicative method to specify a new set of tuples (that is, a new table) that’s based on the given table.
Using the employee table T3
from Listing 5-7, let’s assume you want to restrict that table to only male
employees who have a salary greater than 2500
. Here is how you would specify that (we have named this result T4
):
T4 := { e | e∈T3 ∧ e(sex)='male' ∧ e(salary)>2500 }
Table T4
has two tuples: the ones representing employees 'Lucas'
and 'Aidan'
, who are the only male
employees in T3
that earn more than 2500
.
The specification of T4
is an example of a simple case of table restriction, and one you will use a lot. Here is the general pattern for this kind of restriction:
{ t | t∈T ∧ P(t) }
In this expression, T
represents the table that is being restricted and P(t)
represents a predicate with one parameter of some tuple type. Predicate P
can be a simple predicate or a compound predicate (that is, one that involves logical connectives). This predicate will typically have expressions that involve function application using argument t
; for each tuple of T
you can decide if it remains in the result set by inspecting one or more attribute values of (only) that tuple.
Restrictions of a table can be a bit more complex. For instance, let’s assume that we want to restrict table T3
to only the male
employee who earns the most (among all males) and the female
employee who earns the most (among all females). A way to specify this restriction of T3
, which we will name T5
, is as follows:
T5 := { e | e∈T3 ∧ ¬(∃e2∈T3: e2(sex)=e(sex) ∧ e2(sal)>e(sal)) }
Table T5
will have every tuple (say e
) of T3
for which there is no other tuple in T3
(say e2
), such that e
and e2
match on the sex
attribute, and the sal
value in e2
is greater than the sal
value in e
. Note that this particular restriction actually results in a table of three tuples; there are two male employees in T3
who both earn the most.
This more complex case of restricting a table conforms to another pattern that is often applied. Here is the general pattern for this kind of restriction:
{ t | t∈T ∧ P(t,T) }
In this pattern, P(t,T)
represents a predicate with two parameters. The first parameter takes as a value a tuple from table T
, and the second one takes as a value the actual table T
. For each tuple of T
, you decide if it remains in the result set, not only by inspecting one or more attribute values of that tuple, but also by inspecting other tuples in the same table.
Restrictions can also involve other tables (other than the table being restricted). Say we have a table named D1
over {dno,dname,loc}
, representing a department table (with the obvious semantics). Let’s assume that we want to restrict table T3
to only the male
employees who earn more than 2500
and who are employed in a department that is known in D1
and located in San Francisco ('SF'
). A way to specify this restriction of T3
is as follows:
{ e | e∈T3 ∧ e(sex)='male' ∧ e(sal)>2500 ∧
(∃d∈D1: e(dno)=d(dno) ∧ d(loc)='SF') }
This case of restricting a table conforms to the following pattern:
{ t | t∈T ∧ P(t,S) }
Here S
represents some given table (that differs from T
). For each tuple of T
, you now decide if it remains in the result set by not only inspecting one or more attribute values of that tuple, but also by inspecting tuples in another table.
As you’ll understand by now, there are many more ways to perform a restriction of a given table; any number of other tables (including the table being restricted) could be involved.
In practice it’s also generally possible to specify such a restriction on a table that is the result of a join (see the next section).
In a database you’ll often find that two tables are related to each other through some attribute (or set of attributes) that they share. If this is the case, then you can meaningfully combine the tuples of these tables.
This combining of tuples is done via the compatibility concept of two tuples (you might want to go back to Definition 4-3 and refresh your memory). Every tuple from the first table is combined with the tuple(s) from the second table that are compatible with this tuple. Let’s demonstrate this with an example. Figures 5-5 and 5-6 introduce table T6
(representing employees) and table T7
(representing departments).
You can combine tuples from tables T6
and T7
that correspond on the attributes that they share, in this case the deptno
attribute. The resulting table from this combination has as its heading the union of the heading of T6
and the heading of T7
.
Figure 5-7 displays the result of such a combination of tables T6
and T7
.
You can formally specify this table in a predicative way as follows:
{ e∪d | e∈T6 ∧ d∈T7 ∧ e(deptno)=d(deptno) }
Every tuple from T6
is combined with the tuples in T7
that have the same deptno
value. This combining of tuples of two “related” tables is known as the join of two such tables. Definition 5-4 formally defines this operator.
Definition 5-4: The Join of Two Tables Let R
and T
be two (not necessarily distinct) tables. The join of R
and T
(notation R⊗T
) is defined as follows:
R⊗T := { r∪t | r∈R ∧ t∈T ∧ "r and t are compatible" }
The join of two tables, say R
and T
, will hold the union of every pair of tuples r
and t
, where r
is a tuple in R
, and t
is a tuple in T
, such that the union of r
and t
is a function (compatibility concept). With this definition, you can now specify T8
(from Figure 5-7) as T6⊗T7
.
The set of attributes that two tables, that are being joined, have in common is referred to as the set of join attributes. We will also sometimes say that two tuples are joinable, instead of saying that they are compatible.
As mentioned at the end of the previous section on restriction, you’ll often restrict the join of two tables. For example, here is the restriction of the join of tables T6
and T7
to only those tuples that represent clerks located in Denver:
{ t | t∈T6⊗T7 ∧ t(job)='CLERK' ∧ t(loc)='DENVER' }
Note that Definition 5-4 is not restricted to those cases in which the operands actually have at least one attribute in common. You’re allowed to join two tables for which the set of join attributes is empty (the compatibility concept allows for this); in this case every tuple from the first table is compatible with every tuple from the second table (this follows from Definition 4-3). You are then in effect combining every tuple of the first table with every tuple of the second table. This special case of a join is also referred to as a Cartesian join.
In general, a Cartesian join is of limited practical use in data processing. A Cartesian join that involves a table of cardinality one is reasonable and sometimes useful. Here is an example of such a Cartesian join:
T7⊗{ { (maxbudget;75000) } }
This expression joins every tuple of T7
with every tuple of the right-hand argument of the join operator, which is a table over {maxbudget}
. It effectively extends every department tuple of T7
with a new attribute called maxbudget
. This Cartesian join is in fact a special case of table extension, which will be treated shortly hereafter in the section “Extension.”
Note also an opposite special case of a Cartesian join, in which you join two tables that share the same heading. That is, the set of join attributes equals this shared heading. In such a join you are in effect intersecting these two tables. This makes the intersection a special case of the join (and therefore a redundant operator).
Sometimes two tables are related to each other such that joining these two tables makes sense. However, the names of the attributes with which the joining should be performed don’t match; the set of join attributes is the empty set. In these cases, the join operator won’t work as intended; it will perform a Cartesian join. The way a join is defined requires the names of the intended join attributes to be the same. For instance, suppose the deptno
attribute in the preceding table T7
was named dept#
. Performing the join of T6
and T7
—T6⊗T7
—would then result in a Cartesian join of T6
and T7
. To fix this, you must first either rename attribute deptno
in T6
to dept#
, or rename attribute dept#
in T7
to deptno
.
Also, sometimes performing the intersection, union, or difference of two tables makes sense; however, the headings of the two tables differ, not in cardinality, but in the names of the attributes. To fix this too, you must first rename attributes of the tables involved such that they have equal headings.
For this, we introduce the attribute renaming operator; Definition 5-5 formally defines this operator.
Definition 5-5: The Attribute Renaming Operator Let T
be a table and f
be a function. The renaming of attributes in T
according to f
(notation T◊◊f
) is defined as follows:
T◊◊f := { t◊f | t∈T }
.
You can view the attribute renaming operator as a version of the function composition operator that has been lifted to the table level.
Using this operator, we can rename attribute deptno
in T6
to dept#
as follows:
T6◊◊{(empno;empno),(ename;ename),(job;job),(sal;sal),(dept#;deptno)}
Note that the renaming function f
holds an ordered pair for every attribute in the heading of T6
. All second coordinates in the ordered pairs represent the current attribute names of the table. Only if an attribute needs to be renamed do the first and second coordinates of the ordered pair differ; the first coordinate will have the new name for the attribute.
Sometimes you want to add attributes to a given table. You want to introduce new attributes to the heading of the table, and supply values for these new attributes for every tuple in the table. Performing this type of operation on a table is referred to as extending the table, or performing table extension. Table extension does not require a new mathematical symbol; you can simply use the predicative method to specify a new set of tuples (that is, a new table) that’s based on the given table.
Let’s take a look at an example of table extension. Figure 5-8 shows two tables. The one at the left is a table over {empno,ename}
; it is the result of projecting table T6
(see Figure 5-5) on {empno,ename}
. We’ll name this table T9
. The one at the right (T10
) represents the extension of T9
with the attribute initial
. For every tuple, the value of this attribute is equal to the first character of the value of attribute ename
.
Note We’ll be using a function called substr
to yield a substring from a given string. The expression substr(<some string value>,n,m)
represents the substring of <some string value>
that starts at position n
and is m
characters long. For instance, the expression substr('I like this book',13,4)
is equal to the string value 'book'
.
Given table T9
, you can formally specify table T10
as follows:
T10 := { e ∪ { (initial;substr(e(ename),1,1)) } | e∈ T9 }
For every tuple, say e
, in T9
, table T10
holds a tuple that is represented by the following expression:
e ∪ { (initial;substr(e(ename),1,1)) }
This expression adds an attribute-value ordered pair to tuple e
. The attribute that is added is initial
. The value attached to this attribute is substr(e(ename),1,1)
, which represents the initial of value e(ename)
.
Here’s another example. Figure 5-9 shows two tables. The one at the left (T11
) represents a department table over {deptno,dname}
. The one at the right (T12
) is the extension of T11
with an emps
attribute. For each tuple, the emps
attribute represents the number of employees found in table T6
(see Figure 5-5) that are assigned to the department represented by the tuple.
Given table T11
and T6
, you can formally specify table T12
as follows:
T12 := { d ∪ { (emps;#{ e | e∈T6 ∧ e(deptno)=d(deptno)}) } | d∈T11 }
Here we’ve used the cardinality operator (symbol #
) to count the number of employees in each department.
Aggregate operators operate on a set. They yield a (numeric) value by aggregation, of some expression, over all elements in the set. There are five common aggregate operators: sum, average, count, minimum, and maximum.
You were introduced to the sum operator in Chapter 2; you might quickly want to revisit Definition 2-12. You can apply the sum operator to a table. Here is an example. Given table T6
(see Figure 5-5), here is an expression yielding the sum of all salaries of employees working for the research department (deptno=10
):
(SUM x∈{ t | t∈T6 ∧ t(deptno)=10 }: x(sal))
The resulting value of the preceding sum aggregation is 24100
. Now take a look at the following expression that again uses table T6
:
{ e1 ∪ {(sumsal;(SUM x∈{ e2 | e2∈T6 ∧ e2(deptno)=e1(deptno) }: x(sal)))}
| e1 ∈ T6⇓{deptno} }
Here we first project T6
on {deptno}
and then extend the resulting table with a sumsal
attribute. For every tuple in this resulting table, we determine the value of this new attribute by computing the sum of the salaries of all employees who work in the department corresponding to the department represented by the tuple. Figure 5-10 shows the result of this expression.
We don’t need to introduce a new definition for the count operator. We can simply use the set-theory symbol #
(cardinality) to count the elements of a set. You already saw an example of this in the preceding “Extension” section; the formal specification of T12
is an example of the count aggregate operator.
Definition 5-6 defines the average aggregate operator.
Definition 5-6: The Average Operator (AVG) The average of an expression f(x)
, where x
is chosen from a given, non-empty set S
, and f
represents an arbitrary numeric function over x
, is denoted as follows:
(AVG x∈S: f(x))
For every element x
that can be chosen from set S
, the average operator evaluates expression f(x)
and computes the average of all such evaluations. Note that this operator is not defined in case S
is the empty set.
In the same way as earlier, Definition 5-7 defines the maximum (MAX
) and minimum (MIN
) aggregate operators.
Definition 5-7: The Maximum (MAX) and Minimum (MIN) Operators The maximum and minimum of an expression f(x)
, where x
is chosen from a given, non-empty set S
, and f
represents an arbitrary numeric function over x
, are respectively denoted as follows:
(MAX x∈S: f(x)) and (MIN x∈S: f(x))
For every element x
that can be chosen from set S
, the maximum and minimum operators evaluate expression f(x)
and compute the maximum, or respectively the minimum, of all such evaluations. Note that these operators are also not defined in case S
is the empty set.
Next to the (number valued) aggregate operators discussed so far, two other aggregate operators are worth mentioning. They are truth-valued aggregate operators; they yield TRUE
or FALSE
. You’ve already been introduced to these two in Chapter 3; they are the existential quantification and the universal quantification. They yield a Boolean value by instantiating some predicate over all elements in a set (used as the arguments) and computing the truth value of the combined disjunction or conjunction, respectively, of all resulting propositions.
We conclude this chapter with a few examples that again use T6
. Listing 5-8 displays expressions involving these operators and also supplies the values that they yield.
/* Minimum salary of employees working in department 10 */
(MIN x ∈ { e |e∈T6 ∧ e(deptno)=10 }: x(sal))
= 2100
/* (rounded) Average salary of all employees */
(AVG x∈ T6 : x(sal))
= 5114
/* Table over {deptno,minsal,maxsal} representing the minimum and maximum
salary per department (number) */
{ e1 ∪ { (minsal; (MIN x ∈ { e2 |e2∈T6 ∧ e2(deptno)=e1(deptno) }: x(sal))),
(maxsal; (MAX x ∈ { e2 |e2∈T6 ∧ e2(deptno)=e1(deptno) }: x(sal))) }
| e1∈T6⇓{deptno} }
= { {(deptno;10), (minsal;2100), (maxsal;8500)}
, {(deptno;12), (minsal;2700), (maxsal;6000)} }
This section provides a summary of this chapter, formatted as a bulleted list. You can use it to check your understanding of the various concepts introduced in this chapter before continuing with the exercises in the next section.
x
tables, can formally be represented as a function containing x
ordered pairs. In every ordered pair, the first coordinate represents the name of a table structure and the second coordinate represents the current table for that table structure.∪
), intersection (∩
), and difference (–
) with tables as their operands.⇓
) to limit all tuples to a given subset of the heading of the table.⊗
) combines compatible tuples from two given tables. Because it is based on the compatibility concept, it requires the attributes that are to be used for this combination to have matching names.◊◊
) enables you to selectively rename one or more attributes in the heading of a table. You can use it to ensure that attributes of two tables have matching names, prior to performing the join of these two tables.Figure 5-11 introduces three tables P
, S
, and SP
, which will be used in the exercises.
chr_PART
, in which
250
.400
cannot be a drill
.10
, 15
, and 20
there are never more than 42
items in stock.E1 := P∪{ {(partno;201)} }
E2 := P∪{ {(partno;201),(price;35),(pname;'Optical mouse')} }
E3 := S–{ {(location;'DALLAS')} }
E4 := S–{ s | s∈S ∧ (∃sp∈SP: sp(suppno) = s(suppno) ∧ sp(available)>150) }
E5 := SP–{ sp | sp∈SP ∧ ¬(∃p∈P: p(partno)=sp(partno)) }
E6 := S⊗SP
E7 := P⊗SP
E8 := S⊗P
E9 := S⊗(P⊗SP)
E10 := SP⊗(S⇓{suppno,location})
E11 := (SP∞{(suppno;suppno),(partno;partno),(reserved;available),
(available;reserved)})⇓{suppno,reserved}
P1 := ( ∀p1∈P: ( ∀p2∈P: p1(partno) ≠p2(partno) ) )
P2 := ( ∀p1∈P: ( ∀p2∈P: p1<>p2 ⇒p1(partno) ≠p2(partno) ) )
P3 := ( ∀sp∈SP: ( ∃p∈P: sp(partno) = p(partno) ) )
P4 := { s(suppno)| s∈S } = { sp(suppno) | sp∈SP }
P5 := P⇓{partno} = SP⇓{partno}
P6 := ( ∀sp1∈SP: sp1(available)=0 ⇒
( ∃sp2∈SP: sp2(partno)=sp1(partno) ∧ sp2(suppno)≠sp1(suppno) ∧
sp2(available)>0 ) )
P7 := (P⊗SP)⇓{partno,pname,price} = P
T6
(see Figure 5-5), what values do the following two expressions yield?
(SUM x∈{ t | t∈T6 ∧ t(deptno)=10 }: x(sal))
(SUM x∈{ t(SAL) | t∈T6 ∧ t(deptno)=10 }: x)
Explain the difference between these two expressions.
18.188.98.148