images

From this chapter onward we’ll assume that you’re thoroughly familiar with all terminology introduced in Part 1 of this book. Only if we deem it necessary will we make explicit references back to where specific terminology was first introduced and defined.

We’ll start with an introduction of various interrelated terms that are used in this chapter and the other chapters of Part 2 that follow.

In Part 1 we hinted that a table can be considered a set of functions. In this chapter we’ll demonstrate this in detail. The section “Formal Specification of a Table” shows how you can formally represent a table. It also introduces a shorthand notation for a table and defines what’s called the heading of a table. We’ll familiarize you with a method to formally specify—in a predicative way—a table given a characterization. You already saw a preview of this in Chapter 4.

The section “Database States” deals with formally specifying a database state. The database state that is the current value of the database variable holds all tables of a database. This section also defines the concept of a database skeleton.

The section “Operations on Tables” discusses various common table operators: union, intersection, difference, projection, restriction, join, attribute renaming, extension, and aggregation. These operators take tables as their operands and produce another table.

You’ll find a “Chapter Summary” at the end, followed by an “Exercises” section.

Terminology

This section introduces various interrelated terms that we’ll use in the chapters of Part 2. Because some of these terms might have a meaning to you that differs from the one we intend to use in this book, or because some of these terms are new to you, we decided to introduce them here in advance. This introduction is informal. Two of these terms (database design and table design) were already used in Chapter 4. Some of the terms are formal terms; we’ll specify these precisely in this part of the book.

imagesNoteYou might find the informal introduction of some of these terms somewhat vague. If so, then don’t worry about it now. Just start reading the chapters of Part 2 and revisit this informal introduction of the terms later.

Database Design

Database design is the informal term for the blueprint of a set of data structures that are used to store business data, including the integrity requirements of this data. In this book, these blueprints will always be relational designs; they will conform to the relational model of data. Among other things, this implies that the aforementioned data structures are table data structures; that is, the data is stored via tables (using the meaning of a table as introduced hereafter).

We’ll often use a business term to indicate what particular database design we’re talking about; for instance, the HR (human resources) database design, or the education department database design.

imagesNote The other meaning often attributed to database design is the process of developing such a blueprint. This process includes activities such as requirements analysis, conceptual database design, logical database design, and physical database design. Using this other meaning, you can say that this book refers to a database design as the end result of the logical database design (process). In Part 3 (Chapter 11) we’ll look into the physical database design using an SQL DBMS.

Database Variable

Database variable is an implementation term. It is a variable (that is, a holder for a value) that is capable of storing actual values for the data structures described by a database design. In Chapter 7 we’ll provide a mathematical treatment of the type of—or value set for—a database variable. This value set is referred to as the database universe. At any given point in time and space, a database variable will hold an element of this value set as its current value. The elements of a database universe are referred to as database states. Note that a database consists of exactly one database variable. In a sense, that variable is the database.

Like every variable, the database variable will have a name, allowing us to talk about it without knowing which database state it currently holds.

imagesNote An SQL DBMS does not supply the concept of a database variable. Instead, SQL DBMS vendors have implemented the concept of table variables. A table variable can hold a table as its value; in an SQL DBMS you create one through the create table statement. In these DBMSes, a database design is implemented by creating as many table variables as the database design has table structures, and defining their respective types (instead of defining just one database variable and defining its type).

Database Universe

Database universe is a formal term. It is the mathematical construct used to formally specify the table data structures and integrity requirements of a database design. A database universe is the main part of the formal specification of a database design; it precisely describes the data values (tables) that we allow to be stored by the table data structures. Chapter 7 will establish the formal definition of a database universe; for now just think of it as a set of all values that we allow for a given database variable to hold over the course of time.

We’ll often use a shorthand name for a given database universe (just symbols) for convenience of reference in the text.

Database State

Database state is a formal term. It is a mathematical construct used to formally specify a value that can be held by a database variable. This chapter will establish the formal definition of a database state. We’ll often use a shorthand name for a given database state for convenience of reference in the text.

Database

Database is an informal term for the actual (business) data that is stored in a computer in a structured way (that is, conforming to the database design). Abstractly, a database is the current value of the database variable; it is the informal term for database state.

Database Management System (DBMS)

A DBMS is a computer program used to manage, query, and update the value of a database variable.

Table Design

Table design is the informal term for the blueprint of one of the table data structures of a database design. A relational database design typically consists of several table designs and a set of constraints (see Chapter 7) addressing the integrity requirements for the database. It is often convenient to talk about a single table design, and for this reason we’ll use a business term to indicate what particular table design we’re talking about; for instance, the employee table design, or the department table design.

Table Structure

Table structure, short for table data structure, is an implementation term. It is used to refer to the data structure that implements a specific table design. As mentioned before in the implementation of a database design, you are accustomed to using SQL table variables to implement the separate table data structures. You could regard a table structure as a table variable. However, we deliberately do not use the term table variable in this book; the methodology developed in this book only knows the concept of a database variable. A table structure can be considered a specific subsection of the database variable: the subsection that holds the current table for only one of the involved table designs.

In Chapter 10 we’ll discuss a formal way of specifying updates (value changes) of a database variable; we consider updating just one of the table structures of a database as an update of the database variable. However, when explaining our formal methodology we’ll often conveniently use the term table structure as if it were a separately addressable (table) variable, and we’ll even give such a table data structure a name—as we would give a name to a table variable—enabling us to talk about it without knowing which value is currently held by the data structure.

Table

Table is a formal term. It is a mathematical construct used to formally specify a value that can be held by a table structure that implements a specific table design (or if you will, a value that can be held by a table variable). We’ll formally establish this term in the next section. We’ll often use a shorthand name for a given table for convenience of reference in the text.

Tables

In Chapter 4 you saw that you can use a tuple (a set of ordered pairs) to describe a true proposition in the real world. For instance, the tuple { (empno;100), (ename;'Smith'), (job;'MANAGER') } might represent this proposition: “Employee with number 100 has name Smith and is employed as a MANAGER.” In this section we’ll build on this use of tuples and introduce you to a formal way of specifying a table.

Formal Specification of a Table

A table represents many (zero or more) true propositions of the same kind. If you can use a tuple to represent a single proposition, then evidently a table can be regarded as a set of tuples. Take a look at the set of tuples, which we have named T1, in Listing 5-1.

Listing 5-1. A Set of Tuples

T1 :=
{ { (partno;1), (name;'hammer'),      (instock;22), (price;10) }
 ,{ (partno;2), (name;'screwdriver'), (instock;19), (price;5)  }
 ,{ (partno;3), (name;'axe'),         (instock;0),  (price;30) }
 ,{ (partno;4), (name;'saw'),         (instock;4),  (price;15) }
 ,{ (partno;5), (name;'wrench'),      (instock;7),  (price;20) }
 ,{ (partno;6), (name;'scissors'),    (instock;32), (price;5)  } }

Set T1 contains six tuples; in other words, six functions (recall that a tuple can be regarded as a function). All six functions have the same domain: {partno,name,instock,price}. We’ll refer to a set of functions that all share the same domain as a table.

The tuples in T1 represent six propositions. They can be regarded as instantiations of the following external predicate, for instance: “The part with part number partno is named name, instock items of it are currently held in stock, and its price is price.”

imagesNote The English word “table” here is being used in a different sense than its normal meaning (twodimensional picture of columns and rows). In this book we’ll use table to denote a set of functions. However, you’ll see that a table is usually represented in columns and rows.

Definition 5-1 describes the concept of a table in a formal way. It uses the concept of “a function over a given set” (see Definition 4-2).

imagesDefinition 5-1: Table If T and H are sets, then “T is a table over H⇔( ∀t∈T: t is a function over H ).

This definition is generic in the sense that no restrictions are imposed upon the elements of set H. However, in practice we’ll only be interested in those cases where H is a set of names representing attributes.

Table T1 can be considered a parts table. It is a table over {partno,name,instock,price}, consisting of six tuples, each representing information about a different part. It holds for each such part the part number, its name, how many items of the part are in stock, and the price of the part.

Here are a few more examples:

T2 := { { (X;2), (Y;1) }, { (Y;8), (X;0) }, { (Y;10), (X;5) } }
T3 := { { (partno;3), (name;'hammer') }, { (pno;4), (pname;'nail') } }
T4 := { { (empno;105), (ename;'Mrs. Sparks'), (born;'03-apr-1970') }
       ,{ (empno;202), (ename;'Mr. Tedesco') } }

T2 is indeed a table. It holds three functions, all of which share the domain {X,Y}. It is a table over {X,Y}. Note that the order of the pairs (inside the functions) doesn’t matter. T3 is not a table. It is a set of functions; however, the domain of the first function is {partno,name}, which differs from the domain of the second function: {pno,pname}. Likewise, T4 is also not a table. It is a set of functions; however, the domain of the first function is {empno, ename, born}, which differs from the domain of the second function: {empno,ename}.

An element of a table is a function, and each such function is referred to specifically as a tuple. In Chapter 4 you were introduced to this term when we introduced the generalized product of a characterization (see the section “The Generalized Product of a Set Function”). The generalized product of a characterization is in fact a table; it holds functions, all of which share the same domain.

A table is a set, and the elements of this set are tuples. By the definition of a set, this implies that every tuple is unique within that set; no tuple can appear more than once in the same table.

If T is a table over H, then every proper, non-empty subset of T (containing fewer tuples than T) is of course also a table over H.

Even the empty set () is a table. In fact, under Definition 5-1 (which quantifies over the elements in the table), the empty set is a table over any set. You might want to revisit the rewrite rules in Table 3-2 for this.

imagesNote The empty set is often used as the initial state for a given table structure.

Shorthand Notation

Writing down a table using the formal enumerative method as introduced in Listing 5-1 is quite elaborate. For every tuple that’s an element of the table, you’re essentially repeating the (shared) domain as the first coordinates of the ordered pairs. To avoid this repetition, it’s common to draw a picture of a table. In this picture you list the names of the attributes of the tuples only once (as column headers) and under those you then list the attribute values for every tuple (one per row). Figure 5-1 shows this shorthand notation of a table. It does so for table T1 introduced in Listing 5-1.

images

Figure 5-1. Shorthand notation for table T1

As you can see, this looks like a table in the common language sense (a two-dimensional picture of columns and rows). But remember, a table—in this book—is a set of functions. This has two important implications. First, this means that the order in which functions are enumerated is arbitrary; it does not matter. And second, the order in which the ordered pairs are enumerated (within a tuple) does not matter either. Therefore, in the preceding shorthand notation, the order of the column headers (left to right) and the order of the rows (top to bottom) don’t matter.

imagesNote In the shorthand notation, an ordering to the attributes has been introduced because the order of attribute values in each tuple now has to correspond to the ordering of the column headings.

The shorthand notation demonstrates some other terminology that is often used when dealing with tables. As you can see, table T1 is a table over {partno,name,instock,price}. This shared domain of the tuples in T1 is referred to as the heading of table T1 (see Definition 5-2).

imagesDefinition 5-2: Heading of a Table If “T is a table over H” then H is referred to as the heading of T.

We’ll often use the shorthand notation in the remainder of this book to illustrate a particular table. However, you should never forget that it is only a shorthand notation. In this book, a table is formally defined as a set of functions, all of which share the same domain.

This formal definition of a table enables you to deal with operations on tables (later in this chapter) and data integrity predicates (discussed in the next chapter) in a clear and formal way too.

Table Construction

You can construct a table from a given set function by applying the generalized product to it. Let’s demonstrate this with an example. Listing 5-2 displays a set function called F1.

Listing 5-2. Set Function F1

F1 := { (X; {0,1,2})
       ,(Y; {0,1,2})
       ,(Z; {-1,0,1}) }

The generalized product of set function F1 will result in a set of twenty-seven functions (three times three times three). Listing 5-3 shows this result.

Listing 5-3. The Generalized Product of Set Function F1

Π(F1) :=
     { { (X;0), (Y;0), (Z;-1) }, { (X;0), (Y;0), (Z;0) }, { (X;0), (Y;0), (Z;1) }
     , { (X;0), (Y;1), (Z;-1) }, { (X;0), (Y;1), (Z;0) }, { (X;0), (Y;1), (Z;1) }
     , { (X;0), (Y;2), (Z;-1) }, { (X;0), (Y;2), (Z;0) }, { (X;0), (Y;2), (Z;1) }
     , { (X;1), (Y;0), (Z;-1) }, { (X;1), (Y;0), (Z;0) }, { (X;1), (Y;0), (Z;1) }
     , { (X;1), (Y;1), (Z;-1) }, { (X;1), (Y;1), (Z;0) }, { (X;1), (Y;1), (Z;1) }
     , { (X;1), (Y;2), (Z;-1) }, { (X;1), (Y;2), (Z;0) }, { (X;1), (Y;2), (Z;1) }
     , { (X;2), (Y;0), (Z;-1) }, { (X;2), (Y;0), (Z;0) }, { (X;2), (Y;0), (Z;1) }
     , { (X;2), (Y;1), (Z;-1) }, { (X;2), (Y;1), (Z;0) }, { (X;2), (Y;1), (Z;1) }
     , { (X;2), (Y;2), (Z;-1) }, { (X;2), (Y;2), (Z;0) }, { (X;2), (Y;2), (Z;1) } }

In this result set, every function has the same domain {X,Y,Z}; the result set is therefore a table over {X,Y,Z}. Now take a look at the following, more realistic, example. Listing 5-4 shows a characterization of a part. For a part, the attributes of interest are the number of the part (partno), its name (name), the quantity in stock of this part (instock), and the part’s price (price).

Listing 5-4. Characterization of a Part

chr_PART := { (partno;  [1..999])
             ,(name;    varchar(12))
             ,(instock; [0..99])
             ,(price;   [1..500]) }

The generalized product of chr_PART is a rather large table; it contains all possible tuples that can be generated using the given attribute-value sets that are introduced in the definition of chr_PART. Note that every tuple inside table T1—introduced at the beginning of this chapter— is an element of Π(chr_PART). This makes T1 a subset of Π(chr_PART).

imagesNote Every subset of Π(chr_PART)—not just T1—is a table over {partno,name,instock,price}.

In fact, every tuple in table T1 is an element of the following set (named T2) that is based on Π(chr_PART):

T2 := { t | t∈Π(chr_PART) ∧ ( t(price) ≥ 20 ⇒ t(instock) ≤ 10 )
                         ∧ ( t(price) ≤ 5 ⇒ t(instock) ≥ 15 )
      }

Inside the definition of T2, two predicates are introduced that condition the contents of T2. The first condition states that if the price of a part is 20 or more, then the quantity in stock for this part should be 10 or less. The second condition states that if the price of a part is 5 or less, then the quantity in stock for this part should be 15 or more. T2—a table over {partno,name,instock,price}—will hold all and only those tuples of Π(chr_PART) for which both these two conditions are true. Note that because Π(chr_PART) will hold many tuples that violate one or both of these conditions, T2 is a proper subset of Π(chr_PART). Because all tuples of T1 conform to these two conditions, T1 is a subset of T2. In fact, it is a proper subset of T2.

In Chapter 7 we’ll revisit this way of defining T2; that is, taking the generalized product of a characterization and “plugging in” additional predicates.

Database States

A database is a representation of the state of affairs of some organization. It consists of a table for every kind of proposition about this organization that we would like to record in the database. This section introduces you to a formal way of specifying a database via a database state.

Formal Representation of a Database State

Let’s consider a simple database design involving two table structures: one for employees and one for departments. Take a look at tables EMP1 and DEP1, which are displayed in Figures 5-2 and 5-3.

images

Figure 5-2. Example employee table EMP1

images

Figure 5-3. Example department table DEP1

Let’s assume that tables EMP1 and DEP1 represent the current state of the employee and department table structures, respectively. We can formally specify the database state consisting of tables EMP1 and DEP1 as a function. In this function, the first coordinates of the ordered pairs represent the table structure names and the second coordinates hold the (current) values—tables EMP1 and DEP1—for these table structures. Listing 5-5 gives the formal specification of this example database state.

Listing 5-5. The Database State DBS1 Holding Tables EMP1 and DEP1

DBS1 :=
  { (EMPLOYEE;
     { {(empno;101),(ename;'Chris'), (job;'MANAGER'),(sal;7900),(deptno;10)}
      ,{(empno;102),(ename;'Kathy'), (job;'TRAINER'),(sal;6000),(deptno;12)}
      ,{(empno;103),(ename;'Thomas'),(job;'CLERK'),  (sal;2100),(deptno;10)}
      ,{(empno;104),(ename;'David'), (job;'TRAINER'),(sal;5600),(deptno;10)}
      ,{(empno;105),(ename;'Renu'),  (job;'CLERK'),  (sal;3000),(deptno;12)}
      ,{(empno;106),(ename;'Bob'),   (job;'MANAGER'),(sal;8500),(deptno;10)}
      ,{(empno;107),(ename;'Sue'),   (job;'CLERK'),  (sal;2700),(deptno;12)} }
    )
   ,(DEPARTMENT;
     { {(deptno;10),(dname;'RESEARCH'),(loc;'DENVER'),        (salbudget;50000)}
      ,{(deptno;11),(dname;'SALES'),   (loc;'DENVER'),        (salbudget;20000)}
      ,{(deptno;12),(dname;'SUPPORT'), (loc;'LOS ANGELES'),   (salbudget;40000)}
      ,{(deptno;13),(dname;'SALES'),   (loc;'SAN FRANCISCO'), (salbudget;20000)} }
    )
  }

imagesNote We could have also listed DBS1 to equal the set {(EMPLOYEE;EMP1), (DEPARTMENT;DEP1)}.

Database state DBS1 is a function containing just two ordered pairs. The first coordinate of the first ordered pair listed is EMPLOYEE (the name we chose for the employee table structure), and the corresponding second coordinate is table EMP1. Likewise, in the second ordered pair you’ll notice that we chose DEPARTMENT as the name for the department table structure. In this case, the second coordinate is table DEP1.

Given this definition of function DBS1, you can now—using function application—refer to expressions such as DBS1(EMPLOYEE), which denotes table EMP1, and DBS1(DEPARTMENT), which denotes table DEP1.

Database Skeleton

To specify a database state, you not only need actual tables (EMP1 and DEP1 in the preceding example), but you also need to decide upon names for the table structures (EMPLOYEE and DEPARTMENT in the preceding example).

You probably won’t be surprised by now that the formal specification of a database design (which we’ll demonstrate in Chapter 7) also holds the specification of a characterization for every table design involved. You choose the names of the attributes involved in a table design when you specify the characterization for that table design.

A database skeleton collects all these names—for the table structures and the involved attributes—into a single formal structure. A database skeleton is a set function with an ordered pair for every table structure. Every first coordinate introduces the table structure name, and the second coordinate introduces the set of names of the involved attributes.

Listing 5-6 displays the database skeleton for the employee/department database design introduced in the previous section.

Listing 5-6. The Database Skeleton SK1

SK1 :=
  { (EMPLOYEE; {empno,ename,job,sal,deptno} )
   ,(DEPARTMENT; {deptno,dname,loc,salbudget} ) }

As you can see, set function SK1 introduces the names EMPLOYEE and DEPARTMENT for the table structures involved in the employee/department database design, and it attaches the set of names of the relevant attributes to them.

imagesNote You should carefully choose the names introduced by the database skeleton, because they not only constitute the vocabulary between you (the database professional) and your customer (the users), they are also the first stepping stone to understanding the meaning (semantics) of a database design. You’ll see in the following chapters that data integrity constraints form a further important stepping stone for the understanding of the semantics of the database design.

Operations on Tables

This section covers some important table operators. You will apply these operators when specifying queries and transactions (Chapters 9 and 10), or certain types of predicates (Chapters 6, 7, and 8).

We’ll first take a look at the well-known set operators union, intersection, and difference. Next we’ll investigate the projection and restriction of a table, followed by the join—an important operator in the database field—and closely related to the join, the attribute renaming operator. Finally, we’ll deal with extension and aggregation.

Union, Intersection, and Difference

Because tables are sets, you can apply the well-known set operators, union, intersection, and difference, with tables as their operands. This section will explore the application of these set operators on tables.

We’ll use three example tables, named E1, E2, and E3, in the following sections. Figure 5-4 displays these tables.

images

Figure 5-4. Example tables E1, E2, and E3

As you can see, E1 and E2 are both tables over {EMPNO,ENAME,JOB}, and E3 is a table over {E#,NAME,JOB,SAL}. All three tables represent information about employees. For E3, some of the names of the attributes were chosen differently, and E3 holds additional information (the salary). Employee 102 occurs in both table E1 and table E2 (with the same attribute values). Employee 101 occurs in both table E1 and table E3.

Union

As you probably know, the union of two sets holds all objects that are either an element of the first set, or an element of the second set, or an element of both. Here is the union of tables E1 and E2, denoted by E1∪E2:

E1∪E2 =
     { {(EMPNO;101), (ENAME;'Anne'),   (JOB;'TRAINER')  }
      ,{(EMPNO;102), (ENAME;'Thomas'), (JOB;'SALESMAN') }
      ,{(EMPNO;103), (ENAME;'Lucas'),  (JOB;'PRESIDENT')}
      ,{(EMPNO;104), (ENAME;'Pete'),   (JOB;'MANAGER')  } }

The union of E1 and E2 is a table over {EMPNO,ENAME,JOB}. It holds four tuples, not five, because the tuple of employee 102 is a member of both sets.

Now take a look at the union of tables E1 and E3 (E1∪E3):

E1∪E3 =
     { {(EMPNO;101), (ENAME;'Anne'),   (JOB;'TRAINER')            }
      ,{(EMPNO;102), (ENAME;'Thomas'), (JOB;'SALESMAN')           }
      ,{(EMPNO;103), (ENAME;'Lucas'),  (JOB;'PRESIDENT')          }
      ,{(E#;101),    (NAME;'Anne'),    (JOB;'TRAINER'), (SAL;3000)}
      ,{(E#;102),    (NAME;'John'),    (JOB;'MANAGER'), (SAL;5000)} }

This result is a set of functions, but it clearly isn’t a table; not all the functions in this result set share the same domain.

Remember the closure property (in the section “Union, Intersection, and Difference” in Chapter 2)? We’re only interested in those cases where the union of two tables results in another table. The union operator is evidently not closed over tables in general. It’s only closed over tables if the operands are tables that have the same heading. If the operands of the union operator are non-empty tables over different headings, then the resulting set won’t be a table.

Note the special case where the empty table is involved as an operand. The union of a given table with the empty table () always results in the given table; because is a table over any heading, the closure property holds.

Intersection

The intersection of two sets holds all objects that are an element of the first set and an element of the second set. Here’s the intersection of tables E1 and E2 (E1∩E2):

E1∩E2 = { {(EMPNO;102), (ENAME;'Thomas'), (JOB;'SALESMAN')} }

The intersection of E1 and E2 is a table. It’s probably not difficult to see that the intersection of two tables with the same heading will always result in another table. Note that the intersection is also closed over tables when the operands are tables over different headings. However, the intersection is useless in these cases, because it then always results in the empty table; you might want to check this by investigating the intersection of E1 and E3 (tables with different headings).

You can meaningfully intersect tables E1 and (part of ) E3, but first you’d have to transform one of these in such a way that it has the same heading as the other table. The concepts that enable you to do so have all been introduced in Chapter 4: function limitation and function composition.

Take a look at the following definition for table E4. It renames attributes E# and NAME of table E3.

E4 := { e◊{(EMPNO;E#),(ENAME;NAME),(JOB;JOB),(SAL;SAL)} | e∈E3 }

Set E4 is a table over {EMPNO,ENAME,JOB,SAL}. We used function composition (; see Definition 4-8) to rename two attributes. Attribute E# is renamed to EMPNO and attribute NAME is renamed to ENAME. The other two attributes (JOB and SAL) are left untouched.

Next we need to get rid of the extra SAL attribute (which is also not part of the heading of E1). For this we use function limitation (, see Definition 4-4). Take a look at the definition of E5:

E5 := { e↓{EMPNO,ENAME,JOB} | e∈E4 }

E5 equals the following set:

{ {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER')}
 ,{(EMPNO;102), (ENAME;'John'), (JOB;'MANAGER')} }

The intersection of E5 with E1 has now become meaningful, and results in the following set:

E5∩E1 = { {(EMPNO;101), (ENAME;'Anne'), (JOB;'TRAINER')} }

Last, we note the special case where the empty table is involved as an operand. The intersection of a given table with the empty table always results in the empty table.

Difference

The difference of two sets holds all objects that are an element of the first set and that are not an element of the second set. Here is the difference of tables E1 and E2 (E1–E2):

E1–E2 =
      { {(EMPNO;101), (ENAME;'Anne'),  (JOB;'TRAINER')  }
       ,{(EMPNO;103), (ENAME;'Lucas'), (JOB;'PRESIDENT')} }

Again, as you can see, the result is a table over {EMPNO,ENAME,JOB}. The difference of two tables with the same heading always produces another table. Like the intersection, the difference operator is also closed over tables when the operands are tables over different sets. However, here too, in these cases the difference is useless because it always results in the first table; the second table cannot have tuples that are in the first table (due to the different headings). Hence, tuples will never be “removed” from the first set.

Because the difference operator is not commutative, we note the two special cases when the empty set is involved as an operand (in contrast with the preceding intersection and union). Let T be a given table; then T–∅ always results in a given table T, and ∅–T always results in the empty table.

Projection

Another important table operator that needs to be introduced is the projection of a table on a given set. The projection of a given table—say T—on a given set—say B—performs the limitation of every tuple in T on set B. We use symbol to denote projection. The projection operator can be viewed as a version of the limitation operator that has been lifted to the table level. Definition 5-3 formally defines this operator.

imagesDefinition 5-3: Projection of a Table Let T be a set of functions and B a set. The projection of T on B, notation T⇓B, is defined as follows:

T⇓B := { t↓B | t∈T }

The projection of T on B holds every function in T limited to B. Although the preceding definition describes the projection for each set of functions T and each set B, we are mainly (but not exclusively) interested in those cases in which T is a table, and moreover in which B is a (non-empty) proper subset of the heading of such a table.

Let’s take a look at an example to illustrate the concept of projection. Listing 5-7 introduces table T3. It holds five tuples with domain {empno,ename,salary,sex,dno}.

Listing 5-7. Table T3

T3 := { {(empno;10), (ename;'Thomas'),    (salary;2400), (sex;'male'),   (dno;1)}
       ,{(empno;20), (ename;'Lucas'),     (salary;3000), (sex;'male'),   (dno;1)}
       ,{(empno;30), (ename;'Aidan'),     (salary;3000), (sex;'male'),   (dno;2)}
       ,{(empno;40), (ename;'Keeler'),    (salary;2400), (sex;'male'),   (dno;1)}
       ,{(empno;50), (ename;'Elizabeth'), (salary;5600), (sex;'female'), (dno;2)} }

Here’s the result of the projection of T3 on {salary,sex}, denoted by T3⇓{salary,sex}:

T3⇓{salary,sex} =
      { {(salary;2400), (sex;'male')}
      , {(salary;3000), (sex;'male')}
      , {(salary;5600), (sex;'female')} }

Note that the projection of T3 on {salary,sex} results in another table: a table over {salary,sex}. This table has only three elements, whereas T3 has five elements. The first and the fourth tuple enumerated in the definition of T3 result in the same (limited) function, as do the second and the third function enumerated in T3. Therefore, only three tuples remain in the resulting set of this projection.

You can now specify E5 introduced in the section “Intersection” with E4⇓{EMPNO,ENAME,JOB}.

Restriction

Tables typically hold many tuples. Often you’re only interested in some of these tuples: tuples that have a certain property. You want to look at a subset of the tuples in the given table. You can derive a subset of a given table through table restriction. This operator does not require a new mathematical symbol; you can simply use the predicative method to specify a new set of tuples (that is, a new table) that’s based on the given table.

Using the employee table T3 from Listing 5-7, let’s assume you want to restrict that table to only male employees who have a salary greater than 2500. Here is how you would specify that (we have named this result T4):

T4 := { e | e∈T3 ∧ e(sex)='male' ∧ e(salary)>2500 }

Table T4 has two tuples: the ones representing employees 'Lucas' and 'Aidan', who are the only male employees in T3 that earn more than 2500.

The specification of T4 is an example of a simple case of table restriction, and one you will use a lot. Here is the general pattern for this kind of restriction:

{ t | t∈T ∧ P(t) }

In this expression, T represents the table that is being restricted and P(t) represents a predicate with one parameter of some tuple type. Predicate P can be a simple predicate or a compound predicate (that is, one that involves logical connectives). This predicate will typically have expressions that involve function application using argument t; for each tuple of T you can decide if it remains in the result set by inspecting one or more attribute values of (only) that tuple.

Restrictions of a table can be a bit more complex. For instance, let’s assume that we want to restrict table T3 to only the male employee who earns the most (among all males) and the female employee who earns the most (among all females). A way to specify this restriction of T3, which we will name T5, is as follows:

T5 := { e | e∈T3 ∧ ¬(∃e2∈T3: e2(sex)=e(sex) ∧ e2(sal)>e(sal)) }

Table T5 will have every tuple (say e) of T3 for which there is no other tuple in T3 (say e2), such that e and e2 match on the sex attribute, and the sal value in e2 is greater than the sal value in e. Note that this particular restriction actually results in a table of three tuples; there are two male employees in T3 who both earn the most.

This more complex case of restricting a table conforms to another pattern that is often applied. Here is the general pattern for this kind of restriction:

{ t | t∈T ∧ P(t,T) }

In this pattern, P(t,T) represents a predicate with two parameters. The first parameter takes as a value a tuple from table T, and the second one takes as a value the actual table T. For each tuple of T, you decide if it remains in the result set, not only by inspecting one or more attribute values of that tuple, but also by inspecting other tuples in the same table.

Restrictions can also involve other tables (other than the table being restricted). Say we have a table named D1 over {dno,dname,loc}, representing a department table (with the obvious semantics). Let’s assume that we want to restrict table T3 to only the male employees who earn more than 2500 and who are employed in a department that is known in D1 and located in San Francisco ('SF'). A way to specify this restriction of T3 is as follows:

{ e | e∈T3 ∧ e(sex)='male' ∧ e(sal)>2500 ∧
              (∃d∈D1: e(dno)=d(dno) ∧ d(loc)='SF') }

This case of restricting a table conforms to the following pattern:

{ t | t∈T ∧ P(t,S) }

Here S represents some given table (that differs from T). For each tuple of T, you now decide if it remains in the result set by not only inspecting one or more attribute values of that tuple, but also by inspecting tuples in another table.

As you’ll understand by now, there are many more ways to perform a restriction of a given table; any number of other tables (including the table being restricted) could be involved.

In practice it’s also generally possible to specify such a restriction on a table that is the result of a join (see the next section).

Join

In a database you’ll often find that two tables are related to each other through some attribute (or set of attributes) that they share. If this is the case, then you can meaningfully combine the tuples of these tables.

This combining of tuples is done via the compatibility concept of two tuples (you might want to go back to Definition 4-3 and refresh your memory). Every tuple from the first table is combined with the tuple(s) from the second table that are compatible with this tuple. Let’s demonstrate this with an example. Figures 5-5 and 5-6 introduce table T6 (representing employees) and table T7 (representing departments).

images

Figure 5-5. Example table T6

images

Figure 5-6. Example table T7

You can combine tuples from tables T6 and T7 that correspond on the attributes that they share, in this case the deptno attribute. The resulting table from this combination has as its heading the union of the heading of T6 and the heading of T7.

Figure 5-7 displays the result of such a combination of tables T6 and T7.

images

Figure 5-7. Table T8, the combination of tables T6 and T7

You can formally specify this table in a predicative way as follows:

{ e∪d | e∈T6 ∧ d∈T7 ∧ e(deptno)=d(deptno) }

Every tuple from T6 is combined with the tuples in T7 that have the same deptno value. This combining of tuples of two “related” tables is known as the join of two such tables. Definition 5-4 formally defines this operator.

imagesDefinition 5-4: The Join of Two Tables Let R and T be two (not necessarily distinct) tables. The join of R and T (notation R⊗T) is defined as follows:

R⊗T := { r∪t | r∈R ∧ t∈T ∧ "r and t are compatible" }

The join of two tables, say R and T, will hold the union of every pair of tuples r and t, where r is a tuple in R, and t is a tuple in T, such that the union of r and t is a function (compatibility concept). With this definition, you can now specify T8 (from Figure 5-7) as T6⊗T7.

The set of attributes that two tables, that are being joined, have in common is referred to as the set of join attributes. We will also sometimes say that two tuples are joinable, instead of saying that they are compatible.

As mentioned at the end of the previous section on restriction, you’ll often restrict the join of two tables. For example, here is the restriction of the join of tables T6 and T7 to only those tuples that represent clerks located in Denver:

{ t | t∈T6⊗T7 ∧ t(job)='CLERK' ∧ t(loc)='DENVER' }

Note that Definition 5-4 is not restricted to those cases in which the operands actually have at least one attribute in common. You’re allowed to join two tables for which the set of join attributes is empty (the compatibility concept allows for this); in this case every tuple from the first table is compatible with every tuple from the second table (this follows from Definition 4-3). You are then in effect combining every tuple of the first table with every tuple of the second table. This special case of a join is also referred to as a Cartesian join.

In general, a Cartesian join is of limited practical use in data processing. A Cartesian join that involves a table of cardinality one is reasonable and sometimes useful. Here is an example of such a Cartesian join:

T7⊗{ { (maxbudget;75000) } }

This expression joins every tuple of T7 with every tuple of the right-hand argument of the join operator, which is a table over {maxbudget}. It effectively extends every department tuple of T7 with a new attribute called maxbudget. This Cartesian join is in fact a special case of table extension, which will be treated shortly hereafter in the section “Extension.”

Note also an opposite special case of a Cartesian join, in which you join two tables that share the same heading. That is, the set of join attributes equals this shared heading. In such a join you are in effect intersecting these two tables. This makes the intersection a special case of the join (and therefore a redundant operator).

Attribute Renaming

Sometimes two tables are related to each other such that joining these two tables makes sense. However, the names of the attributes with which the joining should be performed don’t match; the set of join attributes is the empty set. In these cases, the join operator won’t work as intended; it will perform a Cartesian join. The way a join is defined requires the names of the intended join attributes to be the same. For instance, suppose the deptno attribute in the preceding table T7 was named dept#. Performing the join of T6 and T7T6⊗T7—would then result in a Cartesian join of T6 and T7. To fix this, you must first either rename attribute deptno in T6 to dept#, or rename attribute dept# in T7 to deptno.

Also, sometimes performing the intersection, union, or difference of two tables makes sense; however, the headings of the two tables differ, not in cardinality, but in the names of the attributes. To fix this too, you must first rename attributes of the tables involved such that they have equal headings.

For this, we introduce the attribute renaming operator; Definition 5-5 formally defines this operator.

imagesDefinition 5-5: The Attribute Renaming Operator Let T be a table and f be a function. The renaming of attributes in T according to f (notation T◊◊f) is defined as follows:

T◊◊f := { t◊f | t∈T }.

You can view the attribute renaming operator as a version of the function composition operator that has been lifted to the table level.

Using this operator, we can rename attribute deptno in T6 to dept# as follows:

T6◊◊{(empno;empno),(ename;ename),(job;job),(sal;sal),(dept#;deptno)}

Note that the renaming function f holds an ordered pair for every attribute in the heading of T6. All second coordinates in the ordered pairs represent the current attribute names of the table. Only if an attribute needs to be renamed do the first and second coordinates of the ordered pair differ; the first coordinate will have the new name for the attribute.

Extension

Sometimes you want to add attributes to a given table. You want to introduce new attributes to the heading of the table, and supply values for these new attributes for every tuple in the table. Performing this type of operation on a table is referred to as extending the table, or performing table extension. Table extension does not require a new mathematical symbol; you can simply use the predicative method to specify a new set of tuples (that is, a new table) that’s based on the given table.

Let’s take a look at an example of table extension. Figure 5-8 shows two tables. The one at the left is a table over {empno,ename}; it is the result of projecting table T6 (see Figure 5-5) on {empno,ename}. We’ll name this table T9. The one at the right (T10) represents the extension of T9 with the attribute initial. For every tuple, the value of this attribute is equal to the first character of the value of attribute ename.

imagesNote We’ll be using a function called substr to yield a substring from a given string. The expression substr(<some string value>,n,m) represents the substring of <some string value> that starts at position n and is m characters long. For instance, the expression substr('I like this book',13,4) is equal to the string value 'book'.

images

Figure 5-8. Example of extending a table with a new attribute

Given table T9, you can formally specify table T10 as follows:

T10 := { e ∪ { (initial;substr(e(ename),1,1)) } | e∈ T9 }

For every tuple, say e, in T9, table T10 holds a tuple that is represented by the following expression:

e ∪ { (initial;substr(e(ename),1,1)) }

This expression adds an attribute-value ordered pair to tuple e. The attribute that is added is initial. The value attached to this attribute is substr(e(ename),1,1), which represents the initial of value e(ename).

Here’s another example. Figure 5-9 shows two tables. The one at the left (T11) represents a department table over {deptno,dname}. The one at the right (T12) is the extension of T11 with an emps attribute. For each tuple, the emps attribute represents the number of employees found in table T6 (see Figure 5-5) that are assigned to the department represented by the tuple.

images

Figure 5-9. Another example of extending a table with a new attribute

Given table T11 and T6, you can formally specify table T12 as follows:

T12 := { d ∪ { (emps;#{ e | e∈T6 ∧ e(deptno)=d(deptno)}) } | d∈T11 }

Here we’ve used the cardinality operator (symbol #) to count the number of employees in each department.

Aggregation

Aggregate operators operate on a set. They yield a (numeric) value by aggregation, of some expression, over all elements in the set. There are five common aggregate operators: sum, average, count, minimum, and maximum.

You were introduced to the sum operator in Chapter 2; you might quickly want to revisit Definition 2-12. You can apply the sum operator to a table. Here is an example. Given table T6 (see Figure 5-5), here is an expression yielding the sum of all salaries of employees working for the research department (deptno=10):

(SUM x∈{ t | t∈T6 ∧ t(deptno)=10 }: x(sal))

The resulting value of the preceding sum aggregation is 24100. Now take a look at the following expression that again uses table T6:

{ e1 ∪ {(sumsal;(SUM x∈{ e2 | e2∈T6 ∧ e2(deptno)=e1(deptno) }: x(sal)))}
                                                           | e1 ∈ T6⇓{deptno} }

Here we first project T6 on {deptno} and then extend the resulting table with a sumsal attribute. For every tuple in this resulting table, we determine the value of this new attribute by computing the sum of the salaries of all employees who work in the department corresponding to the department represented by the tuple. Figure 5-10 shows the result of this expression.

images

Figure 5-10. Sum of T6, salaries per department

We don’t need to introduce a new definition for the count operator. We can simply use the set-theory symbol # (cardinality) to count the elements of a set. You already saw an example of this in the preceding “Extension” section; the formal specification of T12 is an example of the count aggregate operator.

Definition 5-6 defines the average aggregate operator.

imagesDefinition 5-6: The Average Operator (AVG) The average of an expression f(x), where x is chosen from a given, non-empty set S, and f represents an arbitrary numeric function over x, is denoted as follows:

(AVG x∈S: f(x))

For every element x that can be chosen from set S, the average operator evaluates expression f(x) and computes the average of all such evaluations. Note that this operator is not defined in case S is the empty set.

In the same way as earlier, Definition 5-7 defines the maximum (MAX) and minimum (MIN) aggregate operators.

imagesDefinition 5-7: The Maximum (MAX) and Minimum (MIN) Operators The maximum and minimum of an expression f(x), where x is chosen from a given, non-empty set S, and f represents an arbitrary numeric function over x, are respectively denoted as follows:

(MAX x∈S: f(x)) and (MIN x∈S: f(x))

For every element x that can be chosen from set S, the maximum and minimum operators evaluate expression f(x) and compute the maximum, or respectively the minimum, of all such evaluations. Note that these operators are also not defined in case S is the empty set.

Next to the (number valued) aggregate operators discussed so far, two other aggregate operators are worth mentioning. They are truth-valued aggregate operators; they yield TRUE or FALSE. You’ve already been introduced to these two in Chapter 3; they are the existential quantification and the universal quantification. They yield a Boolean value by instantiating some predicate over all elements in a set (used as the arguments) and computing the truth value of the combined disjunction or conjunction, respectively, of all resulting propositions.

We conclude this chapter with a few examples that again use T6. Listing 5-8 displays expressions involving these operators and also supplies the values that they yield.

Listing 5-8. Example Expressions Involving AVG,MAX, and MIN

/* Minimum salary of employees working in department 10 */
(MIN x ∈ { e |e∈T6 ∧ e(deptno)=10 }: x(sal))
   = 2100
/* (rounded) Average salary of all employees */
(AVG x∈ T6 : x(sal))
   = 5114
/* Table over {deptno,minsal,maxsal} representing the minimum and maximum
   salary per department (number) */
{ e1 ∪ { (minsal; (MIN x ∈ { e2 |e2∈T6 ∧ e2(deptno)=e1(deptno) }: x(sal))),          (maxsal; (MAX x ∈ { e2 |e2∈T6 ∧ e2(deptno)=e1(deptno) }: x(sal))) }
         | e1∈T6⇓{deptno} }
  = { {(deptno;10), (minsal;2100), (maxsal;8500)}
    , {(deptno;12), (minsal;2700), (maxsal;6000)} }

Chapter Summary

This section provides a summary of this chapter, formatted as a bulleted list. You can use it to check your understanding of the various concepts introduced in this chapter before continuing with the exercises in the next section.

  • A table can formally be represented as a set of functions, all of which share a common domain; you can specify this set by writing down all involved functions in the enumerative way.
  • The elements of a table are referred to as tuples; the ordered pairs of a tuple are called attribute-value pairs.
  • The shared domain of all tuples in a table is referred to as the heading of the table.
  • We usually draw a table as a two-dimensional picture of rows and columns (shorthand notation). The column header represents the heading of the table and the rows represent the tuples of the table.
  • The order in which tuples are drawn (shorthand notation) or enumerated (set-theory notation) doesn’t matter. Neither do the order of the columns or the order of the attribute-value pairs matter.
  • The generalized product of a characterization will result in a table. You can use this as the given set to specify tables in a predicative way; by adding predicates you further constrain the content of the set.
  • A database state, containing x tables, can formally be represented as a function containing x ordered pairs. In every ordered pair, the first coordinate represents the name of a table structure and the second coordinate represents the current table for that table structure.
  • A database skeleton formally describes the structure of a database design. It introduces the table structure names and the names of all attributes involved in the database design.
  • Because tables are sets, you can apply the well-known set operators union (), intersection (), and difference () with tables as their operands.
  • You can use the projection operator () to limit all tuples to a given subset of the heading of the table.
  • You can select a subset of the tuples from a given table by specifying, in a predicative way, a new set based on the given table. The predicates that you add in such a predicative specification are said to perform a table restriction.
  • The join operator () combines compatible tuples from two given tables. Because it is based on the compatibility concept, it requires the attributes that are to be used for this combination to have matching names.
  • The attribute renaming operator (◊◊) enables you to selectively rename one or more attributes in the heading of a table. You can use it to ensure that attributes of two tables have matching names, prior to performing the join of these two tables.
  • You can use table extension to add new attributes to a table (including their values for every tuple).
  • The five aggregate operators—sum, count, average, maximum, and minimum—enable you to compute a value by aggregation over all elements in a given set.

Exercises

Figure 5-11 introduces three tables P, S, and SP, which will be used in the exercises.

images

Figure 5-11. Example tables P, S, and SP

  1. Check for each of the following expressions whether it represents a table, and if so, over which set?
    1. P∪S
    2. P∩S
    3. (P⇓{partno}) ∪ (SP⇓{partno})
    4. S–SP
    5. P∪∅
  2. Formally specify a table that is based on characterization chr_PART, in which
    1. A hammer always costs more than 250.
    2. A part that costs less than 400 cannot be a drill.
    3. For parts 10, 15, and 20 there are never more than 42 items in stock.
  3. Evaluate the following expressions.
    E1 := P∪{ {(partno;201)} }
    E2 := P∪{ {(partno;201),(price;35),(pname;'Optical mouse')} }
    E3 := S–{ {(location;'DALLAS')} }
    E4 := S–{ s | s∈S ∧ (∃sp∈SP: sp(suppno) = s(suppno) ∧ sp(available)>150) }
    E5 := SP–{ sp | sp∈SP ∧ ¬(∃p∈P: p(partno)=sp(partno)) }
  4. Evaluate the following expressions.
    E6 := S⊗SP
    E7 := P⊗SP
    E8 := S⊗P
    E9 := S⊗(P⊗SP)
    E10 := SP⊗(S⇓{suppno,location})
    E11 := (SP∞{(suppno;suppno),(partno;partno),(reserved;available),
        (available;reserved)})⇓{suppno,reserved}
  5. Give an informal description of the following propositions and evaluate their truth values.
    P1 := ( ∀p1∈P: ( ∀p2∈P: p1(partno) ≠p2(partno) ) )
    P2 := ( ∀p1∈P: ( ∀p2∈P: p1<>p2 ⇒p1(partno) ≠p2(partno) ) )
    P3 := ( ∀sp∈SP: ( ∃p∈P: sp(partno) = p(partno) ) )
    P4 := { s(suppno)| s∈S } = { sp(suppno) | sp∈SP }
    P5 := P⇓{partno} = SP⇓{partno}
    P6 := ( ∀sp1∈SP: sp1(available)=0 ⇒
            ( ∃sp2∈SP: sp2(partno)=sp1(partno) ∧ sp2(suppno)≠sp1(suppno) ∧
                        sp2(available)>0 ) )
    P7 := (P⊗SP)⇓{partno,pname,price} = P
  6. What is the result of the projection of a given table on the empty set?
  7. Given table T6 (see Figure 5-5), what values do the following two expressions yield?
    (SUM x∈{ t | t∈T6 ∧ t(deptno)=10 }: x(sal))
    (SUM x∈{ t(SAL) | t∈T6 ∧ t(deptno)=10 }: x)

    Explain the difference between these two expressions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.14.93