By means of a query, you retrieve data from a database. In this chapter we’ll demonstrate how you can specify a query in a formal way. To do so, we won’t require the formal introduction of any new concepts; we’ve already presented all ingredients for specifying queries.
The section “Formally Specifying Queries” introduces you to the formal concept of a query over a database universe. You’ll find out that you can specify a query as a set-theory expression based on a given database state. In this section, we’ll also give a few simple examples to familiarize you with this formal concept of a query.
The section “Example Queries Over DB_UEX” provides example queries over DB_UEX
, the example database universe that was introduced in Chapter 7. It will start out with basic queries and gradually move to more complex queries. For every query, we’ll present you with multiple (equivalent) formal expressions to specify the query.
For the first time in this book, you’ll see the use of SQL. Every example query will be accompanied by one or more equivalent SQL expressions. You’ll discover that expressing particular queries in SQL turns out to be not so trivial.
Note We assume you have a working knowledge of SQL. It is certainly not the goal of this book to teach the SQL language (although showing both the formal and the SQL expressions alongside each other might have this as a side effect). As mentioned in the book’s Introduction, the SQL expressions given in this chapter, as well as in the following two chapters, are compliant with the 10g release of Oracle’s SQL DBMS.
As usual, you’ll find a “Chapter Summary” at the end, followed by an “Exercises” section.
Data retrieval is a crucial application of a database. Through queries, you can extract information that is relevant to you from the database. In this section, we’ll demonstrate how you can formally specify queries.
Let’s first start with a simple example. We’ll use the example database design that was introduced in Chapter 7. Assume E
represents the employee table in some database state of universe DB_UEX
. Here’s a formal specification of a query, named Q1
, on E
that extracts the trainers who earn more than 6000
:
Q1(E) := { e | e∈E ∧ e(JOB)='TRAINER' ∧ e(MSAL)>6000 }
Query Q1
returns all employee tuples representing trainers earning more than 6000
. Instead of the complete tuple, if all you were interested in was the employee number (EMPNO
), name (ENAME
), and salary (MSAL
), of trainers earning more than 6000
, then the following query (Q2
) would do the job:
Q2(E) := { e↓{EMPNO,ENAME,MSAL} | e∈E ∧ e(JOB)='TRAINER' ∧ e(MSAL)>6000 }
Note Both Q1
and Q2
return a table as their result. As you’ll see shortly, this is not a requirement in the formal methodology set out by this book; queries are allowed to return simple values too (for instance, a single numeric or string value). However, in practice we prefer query specifications that return tables.
As indicated through the preceding example, you can view a query as a function. In this case, you pass this function a table as its argument, and it then returns another value based on the argument. Of course, queries often involve more than just one table. In general, a query can be considered to accept a database state (our formal structure that contains all current tables) as its argument. The query can then inspect any table available within this database state to determine its return value.
Let’s demonstrate this with an example too. Assume dbs
represents some database state of database universe DB_UEX
. Take a look at query Q3
; it returns the department that employs the president (if there is one):
Q3(dbs) := { d | d∈dbs(DEPT) ∧
(∃e∈dbs(EMP): e(DEPTNO)=d(DEPTNO) ∧ e(JOB)='PRESIDENT') }
Query Q3
inspects two tables held within state dbs
; expression dbs(DEPT)
denotes the department table, and expression dbs(EMP)
denotes the employee table. Because the argument of the query now represents a database state, the query can inspect any table available within that state.
As this example shows us, a query can formally be considered a function over a database universe. In other words, a query is a function whose domain is a set of database states. For every database state, the function returns (or if you will, computes) some value. The result of a query can be a value of any type. Query Q3
results in a table. If there is a president, then the result will be a table with one tuple; if there is no president, then the result will be the empty table. Database queries will typically result in tables. In fact, given the query language SQL, a query always results in a table. In the formalism introduced in this book, the result of a query can be as simple as a single numeric or string value too. Here is an example. Query Q4
returns the number of employees in department 10
:
Q4(dbs) := #{ e | e∈dbs(EMP) ∧ e(DEPTNO)=10 }
The result of query Q4
is a numeric value. Here is a formal specification of Q4
that returns a table:
Q4(dbs) := { { (DEPT_TEN_EMPS; #{ e | e∈dbs(EMP) ∧ e(DEPTNO)=10 }) } }
This version of Q4
returns a table over {DEPT_TEN_EMPS}
. It contains one tuple, with one attribute-value pair.
The next section will present many example queries over DB_UEX
to familiarize you further with formally specifying queries.
This section provides example queries. It starts out with basic queries and gradually moves to more complex queries. For every query, we’ll present you with the following:
Along the way, you’ll find out that it becomes increasingly more difficult to precisely specify a query informally; it becomes hard to avoid ambiguity. Also, you’ll see that SQL has some shortcomings that cause difficulties when you try to translate a formal query specification directly into a SELECT
expression.
Listings 9-1 through 9-11 introduce the example queries. All queries are named EXQi
(example query i
), where i
represents a sequence number. In the formal specifications given for each query, dbs
represents a database state of DB_UEX
. The various alternative formal expressions are named FEc
(formal expression c
), where c
represents a letter (a
, b
, c
, and so on).
The SQL expressions are based on an implementation of the DB_UEX
database design. We’ll introduce this implementation in detail in Chapter 11. For now it suffices to mention that in the implementation of DB_UEX
, every table universe is implemented one on one as an SQL table, and that NULL
s (a concept not known in our formal methodology) are explicitly excluded from that implementation.
In the SQL specifications we’ll use the table structure names introduced by database characterization DB_CHREX
(see Listing 7-38) as the table names available for SELECT
expressions. We’ll name alternative SQL specifications SEr
(SQL expression r
), where r
represents a Roman numeral (I
, II
, III
, and so on).
Note Unless pointed out in the accompanying text, there is no implied relationship between the respective formal and SQL alternative expressions that are listed for every query.
Listing 9-1 gives the data of all departments.
FEa: dbs(DEPT)
FEb: { d | d∈dbs(DEPT) }
SEI: select d.* from DEPT d
SEII: select d.LOC, d.DEPTNO, d.DNAME, d.MGR from DEPT d
This query is straightforward. It retrieves the department table from given database state dbs
: dbs(DEPT)
. Alternative FEb
is a somewhat less succinct way to specify this. In SEI
the star (*
) symbol is used as the shorthand to denote all attributes of the department table in this case. SEII
actually lists all attributes (in some arbitrary order).
Listing 9-2 gives the job of employees who either earn more than 4800
monthly, or who are assigned to salary grade 6
or higher.
FEa: { e(JOB) | e∈dbs(EMP) ∧ (e(MSAL) > 4800 ∨ e(SGRADE) ≥ 6) }
FEb: { e↓{JOB} | e∈dbs(EMP) ∧ (e(MSAL) > 4800 ∨ e(SGRADE) ≥ 6) }
FEc: { e↓(JOB) | e∈dbs(EMP) ∧ (e(MSAL) ≤ 4800 ⇒ e(SGRADE) ≥ 6) }
FEd: { e↓(JOB) | e∈dbs(EMP) ∧ (e(SGRADE) < 6 ⇒ e(MSAL) > 4800) }
FEe: { e↓(JOB) | e∈dbs(EMP) ∧ ¬(e(MSAL) ≤ 4800 ∧ e(SGRADE) < 6) }
FEf: { e↓(JOB) | e∈dbs(EMP) ∧ e(MSAL) > 4800 }
∪
{ e↓(JOB) | e∈dbs(EMP) ∧ e(SGRADE) ≥ 6}
SEI: select distinct e.JOB from EMP e where e.MSAL > 4800 or e.SGRADE >= 6
SEII: select distinct e.JOB from EMP e where not (e.MSAL <= 4800 and e.SGRADE < 6)
SEIII: select e.JOB from EMP e where e.MSAL > 4800
union
select e.JOB from EMP e where e.SGRADE >= 6
Do you notice the subtle difference between FEa
and FEb
? The former results in a set of job values (that is, this is not a table); the latter results in a set of tuples (that is, a table over {JOB}
), and is therefore a preferred way to formally specify EXQ2
.
Alternatives FEc
, FEd
, and FEe
are variations of FEb
, where the disjunction is either rewritten into an implication or a conjunction. Alternative FEf
uses the union operator; jobs of employees earning more than 4800
are unioned with jobs of employees assigned to salary grade 6
or higher.
Note The way this query is formally specified implies that we only retrieve distinct job values. Suppose table dbs(EMP)
currently holds twenty employees, of which ten conform to the predicate “monthly salary is greater than 4800
or salary grade equals 6
or higher.” If eight of those ten employees have job TRAINER
and the other two have job MANAGER
, then the result of FEa
will be {'TRAINER','MANAGER'}
: a set with only two, not ten, elements.
The subtle difference demonstrated by FEa
and FEb
cannot be expressed in SQL. Specification SEI
is the SQL variation for both of these. Also note that in SQL you must add the distinct
keyword to ensure that no duplicate job values appear in the query result.
Alternatives FEc
and FEd
cannot be directly transformed into SQL, because SQL lacks the implication operator; only disjunction, conjunction, and negation are available within SQL.
Specification SEII
is the SQL version for FEe
, and SEIII
represents the SQL version for FEf
. Note that in the latter case, the use of the keyword distinct
is not necessary; SQL will always return distinct values whenever you use a set operator (union, minus, or intersect).
Listing 9-3 gives the number, name, and salary of clerks, including the lower and upper limits of the salary grade to which they are assigned.
FEa: { e↓{EMPNO,ENAME,MSAL,LLIMIT,ULIMIT} | e∈dbs(EMP)⊗
(dbs(GRD)◊◊{(SGRADE;GRADE),(LLIMIT;LLIMIT),(ULIMIT;ULIMIT)}) ∧
e(JOB)='CLERK' }
SEI: select e.EMPNO, e.ENAME, e.MSAL, g.LLIMIT, g.ULIMIT
from EMP e
,GRD g
where e.SGRADE = g.GRADE
and e.JOB = 'CLERK'
This query is represented by a join between the employee and salary grade tables. To be able to use the join operator, you must rename either the attribute SGRADE
in the employee table, or the attribute GRADE
in the salary grade table. Specification FEa
performs the latter renaming. You can derive the SQL expression SEI
directly from FEa
.
Now you’re probably thinking that the SQL expression looks much easier than the formal expression, right? This is because this example is so simple. When we get to more complicated examples, you’ll see that this state of affairs ceases to apply; the SQL expressions will then become more complicated than the mathematical ones.
Listing 9-4 gives the employees (EMPNO
and ENAME
) who manage exactly two departments at two different locations.
FEa: { e↓{EMPNO,ENAME} | e∈dbs(EMP) ∧
#{ d(LOC) | d∈dbs(DEPT) ∧ d(MGR)=e(EMPNO) } = 2 }
FEb: { e↓{EMPNO,ENAME} | e∈dbs(EMP) ∧
(∃d1,d2∈dbs(DEPT): d1(MGR)=e(EMPNO) ∧
d2(MGR)=e(EMPNO) ∧
d1(LOC)≠d2(LOC)) }
FEc: { e↓{EMPNO,ENAME} | e∈ (dbs(EMP)⇓{EMPNO,ENAME})⊗
(dbs(DEPT)◊◊{(EMPNO;MGR),(LOC1;LOC)})⊗
(dbs(DEPT)◊◊{(EMPNO;MGR),(LOC2;LOC)}) ∧
e(LOC1)≠e(LOC2) }
SEI: select e.EMPNO, e.ENAME
from EMP e
where 2 = (select count(distinct d.LOC)
from DEPT d
where d.MGR = e.EMPNO)
SEII: select e.EMPNO, e.ENAME
from EMP e
where exists(select d1.*
from DEPT d1
where d1.MGR = e.EMPNO
and exists(select d2.*
from DEPT d2
where d2.MGR = e.EMPNO
and d1.LOC <> d2.LOC))
SEIII: select e.EMPNO, e.ENAME
from EMP e
where exists(select *
from DEPT d1
,DEPT d2
where d1.MGR = e.EMPNO
and d2.MGR = e.EMPNO
and d1.LOC <> d2.LOC)
SEIV: select distinct e.EMPNO, e.ENAME
from EMP e
,DEPT d1
,DEPT d2
where e.EMPNO = d1.MGR
and e.EMPNO = d2.MGR
and d1.LOC <> d2.LOC
Specification FEa
states that the cardinality of the set of locations of departments managed by the employee must be two. Alternative FEb
achieves this requirement by existentially quantifying over two departments at different locations that are managed by the employee.
Note In FEb
the shorthand notation for two nested existential quantifiers over the same set is used.
You can translate both formal specifications into SQL directly. Note that in SEI
you must use the distinct
keyword within the count
aggregate function. Specification SEII
is given to demonstrate the nested use of the existential quantifier available in SQL—the expression exists(...)
. SEIII
is a shorthand version similar to the shorthand used in formal specification FEb
.
Specification FEc
uses the join to locate two departments at different locations that are managed by the employee. Note that in its SQL version (SEIV
), you must use the distinct
keyword to prevent employees who manage two departments at different locations from appearing twice in the result set.
Listing 9-5 gives the number and name of all managed employees earning more than 90 percent of their boss’s salary.
FEa: { e1↓{EMPNO,ENAME} | e1∈dbs(EMP) ∧ m∈dbs(MEMP) ∧ e2∈dbs(EMP) ∧
e1(EMPNO) = m(EMPNO) ∧ m(MGR) = e2(EMPNO) ∧
e1(MSAL) > 0.9*e2(MSAL) }
FEb: {e↓{EMPNO,ENAME} | e∈ (dbs(EMP)⊗dbs(MEMP))⊗
(dbs(EMP)◊◊{ (MGR;EMPNO), (MGR_MSAL;MSAL) }) ∧
e(MSAL) > 0.9*e(MGR_MSAL) }
FEc: {e1↓{EMPNO,ENAME} | e1∈dbs(EMP) ∧
(∃m∈dbs(MEMP): m(EMPNO)=e1(EMPNO) ∧
(∃e2∈dbs(EMP): e2(EMPNO)=m(MGR) ∧
e1(MSAL) > 0.9*e2(MSAL) )) }
SEI: select e1.EMPNO, e1.ENAME
from EMP e1, MEMP m, EMP e2
where e1.EMPNO = m.EMPNO
and m.MGR = e2.EMPNO
and e1.MSAL > 0.9 * e2.MSAL
SEII: select e1.EMPNO, e1.ENAME
from EMP e1
where exists (select m.*
from MEMP m
where m.EMPNO = e1.EMPNO
and exists (select e2.*
from EMP e2
where e2.EMPNO = m.MGR
and e1.MSAL > 0.9 * e2.MSAL))
By combining every EMP
tuple with the corresponding MEMP
tuple (via specialization PSPEC2
), and then combining the MEMP
tuple back to an EMP
tuple (via subset requirement PSSR3
), you are able to compare the monthly salary of a managed employee with the monthly salary of his or her manager. Specification FEa
achieves this by binding two tuple variables for EMP
(one for the employee and one for the manager) and a tuple variable for MEMP
, and then imposing the combination restrictions on those. Specification FEb
does this by binding a single tuple variable that draws its values from a nested join that performs the combination. The first join combines tuples from EMP
with MEMP
(to find the employee number of the manager). The second join combines tuples from MEMP
back to EMP
(to find the salary of the manager) and requires attribute renaming.
Specification SEI
represents an SQL version for FEa
and FEb
.
Specification FEc
employs a nested existential quantification to achieve the same result. You can translate it directly into SQL (see SEII
) using the support for existential quantification in SQL.
Listing 9-6 gives the code, description, and category for all courses for which every scheduled offering has a maximum capacity of fewer than ten students.
FEa: { c↓{CODE,DESCR,CAT} | c∈dbs(CRS) ∧
(∀o∈dbs(OFFR): (o(COURSE) = c(CODE) ∧
o(STATUS) = 'SCHD') ⇒
o(MAXCAP) < 10 ) }
FEb: { c↓{CODE,DESCR,CAT} | c∈dbs(CRS) ∧
¬(∃o∈dbs(OFFR): o(COURSE) = c(CODE) ∧
o(STATUS) = 'SCHD' ∧
o(MAXCAP) ≥ 10 ) }
FEc: { c↓{CODE,DESCR,CAT} | c∈dbs(CRS) ∧
c(CODE) ∉ { o(COURSE) | o∈dbs(OFFR) ∧
o(STATUS) = 'SCHD' ∧
o(MAXCAP) ≥ 10 ) } }
FEd: { c↓{CODE,DESCR,CAT} | c∈dbs(CRS) ∧
#{ o1 | o1∈dbs(OFFR) ∧ o1(COURSE) = c(CODE) ∧
o1(STATUS) = 'SCHD' }
=
#{ o2 | o2∈dbs(OFFR) ∧ o2(COURSE) = c(CODE) ∧
o2(STATUS) = 'SCHD' ∧
o2(MAXCAP) < 10 ) } }
SEI: select c.CODE, c.DESC, c.CAT
from CRS c
where not exists (select o.*
from OFFR o
where o.COURSE = c.CODE
and o.STATUS = 'SCHD'
and o.MAXCAP >= 10)
SEII: select c.CODE, c.DESC, c.CAT
from CRS c
where c.CODE not in (select o.COURSE
from OFFR o
where o.STATUS = 'SCHD'
and o.MAXCAP >= 10)
SEIII: select c.CODE, c.DESC, c.CAT
from CRS c
where (select count(*)
from OFFR o1
where o1.COURSE = c.CODE
and o1.STATUS = 'SCHD')
=
(select count(*)
from OFFR o2
where o2.COURSE = c.CODE
and o2.STATUS = 'SCHD'
and o2.MAXCAP < 10)
Specification FEa
closely follows the informal description for the query. FEb
is a rewritten version of FEa
; the universal quantifier is rewritten into an existential quantifier. Informally it queries all courses for which there do not exist a scheduled offering with a maximum capacity of ten or more. Because the ability to rewrite predicates into equivalent predicates is extremely important, we’ll demonstrate once more how this is performed in this case:
(∀o∈dbs(OFFR): (o(COURSE)=c(CODE) ∧ o(STATUS)='SCHD') ⇒ o(MAXCAP)<10 )
⇔ /* Add double negation */
¬¬(∀o∈dbs(OFFR): (o(COURSE)=c(CODE) ∧ o(STATUS) = 'SCHD') ⇒ o(MAXCAP)<10 )
⇔ /* Bring one negation into the quantification, quantifier changes */
¬(∃o∈dbs(OFFR): ¬((o(COURSE)=c(CODE) ∧ o(STATUS)='SCHD') ⇒ o(MAXCAP)<10 ))
⇔ /* Transform implication into disjunction */
¬(∃o∈dbs(OFFR): ¬(¬(o(COURSE)=c(CODE) ∧ o(STATUS)='SCHD') ∨ o(MAXCAP)<10 ))
⇔ /* Apply De Morgan, disjunction changes into conjuction */
¬(∃o∈dbs(OFFR): ¬¬(o(COURSE)=c(CODE) ∧ o(STATUS)='SCHD') ∧ ¬o(MAXCAP)<10 )
⇔ /* Get rid of negations */
¬(∃o∈dbs(OFFR): o(COURSE)=c(CODE) ∧ o(STATUS)='SCHD' ∧ o(MAXCAP)≥10 )
You’ll often have to rewrite a universal quantification into an existential quantification for reasons that will soon become clear.
Specification FEc
is a variation of FEb
; the “not exists” is now expressed as a “not an element of” some set. FEd
might seem somewhat far-fetched; only those courses are included in the query result, for which the number of scheduled course offerings equals the number of scheduled course offerings that have a maximum capacity of fewer than ten students. By imposing this restriction, you end up precisely with those courses for which every scheduled offering has a maximum capacity of fewer than ten students.
SQL lacks the possibility to express universal quantification directly.
Note Remember we are using Oracle’s version of SQL in this book. The SQL standard offers a language construct called <quantified comparison predicate>
that can be used to specify universal quantification in SQL. However, Oracle’s SQL DBMS doesn’t support this language construct.
For this reason, you cannot directly translate FEa
into SQL; you must rewrite the universal quantification into an existential quantification (which SQL does support). Once you have done this (which is what alternative FEb
is all about), you can straightforwardly compose the equivalent SQL expression for the query; see SEI
. Alternative SEII
closely follows FEb
too, but instead of the “not exists,” it employs the “not in” operator available within SQL.
Alternative SEIII
is the SQL version of FEc
; this way of query formulation can be considered a trick to mimic a universal quantification within SQL.
What is your opinion on courses that currently do not have any scheduled offering at all? Should they appear in the result set of query EXQ6
, or not?
Note Remember the truth value of a universal quantification over the empty set? Such a quantification is always true, regardless of the predicate that is being quantified.
Courses with no scheduled offerings will appear in the result set of this query. Whenever you encounter a universal quantification—or a negated existential quantification—you should be aware of this consequence (warning bells should ring). Moreover, whenever the set that is quantified over can possibly be the empty set, you should double check with your customer (the end user) whether he or she is aware of this “side effect” and requires this behavior of the query.
If courses with no scheduled offerings at all should not appear in the result set of this query, then here is how you could change FEa
to reflect this:
{ c↓{CODE,DESCR,CAT} | c∈dbs(CRS) ∧
(∃o∈dbs(OFFR): o(COURSE) = c(CODE) ∧
o(STATUS) = 'SCHD') ∧
(∀o∈dbs(OFFR): (o(COURSE) = c(CODE) ∧
o(STATUS) = 'SCHD') ⇒
o(MAXCAP) < 10 ) }
Adding the existential quantification now ensures that courses with no scheduled offerings won’t appear in the result set.
In the next example, you’ll see the use of an NVL
function and the outer join syntax (+)
. Both constructs are available with the Oracle SQL DBMS. We’ll explain their meaning directly after Listing 9-7.
Listing 9-7 gives for each department the department number, name, and number of employees working in that department, as well as the total sum of monthly salaries of employees working in that department.
FEa: { d↓{DEPTNO,DNAME} ∪
{ (NUM_EMP; #{ e1 | e1∈dbs(EMP) ∧ e1(deptno)=d(deptno) })
,(SUM_MSAL; (SUM e2∈{ e3 | e3∈dbs (EMP) ∧ e3(deptno)=d(deptno) }:
e2(MSAL)))
}
| d∈dbs(DEPT) }
FEb: { d ∪
{ (NUM_EMP; #{ e1 | e1∈dbs(EMP) ∧ e1(deptno)=d(deptno) })
,(SUM_MSAL; (SUM e2∈{ e3 | e3∈dbs(EMP) ∧ e3(deptno)=d(deptno) }:
e2(MSAL)))
}
| d∈dbs(DEPT)⇓{DEPTNO,DNAME} }
SEI: select d.DEPTNO
,d.DNAME
,(select count(*)
from EMP e1
where e1.deptno = d.deptno) as NUM_EMP
,(select nvl(sum(MSAL),0)
from EMP e2
where e1.deptno = d.deptno) as SUM_MSAL
from DEPT d
SEII: select d.DEPTNO
,d.DNAME
,count(*) as NUM_EMP
,sum(e.MSAL) as SUM_MSAL
from DEPT d
,EMP e
where d.DEPTNO = e.DEPTNO
group by d.DEPTNO,d.DNAME
SEIII: select d.DEPTNO
,d.DNAME
,count(e.EMPNO) as NUM_EMP
,nvl(sum(e.MSAL),0) as SUM_MSA
from DEPT d
,EMP e
where d.DEPTNO = e.DEPTNO (+)
group by d.DEPTNO,d.DNAME
Do you see how the additional attributes NUM_EMP
and SUM_MSAL
are constructed for the query result? Query EXQ5
returns a table over {DEPTNO,DNAME,NUM_EMP,SUM_MSAL}
. For every tuple in dbs(DEPT)
, a tuple appears in the result set. The first two attributes are drawn directly from a tuple of dbs(DEPT)
. You specify the other two attributes (NUM_EMP
and SUM_MSAL
) separately and add them to the first two attributes through tuple union. You specify the value for NUM_EMP
as the cardinality of the set that holds all employees for the current department. In a similar way, you specify the value for attribute SUM_MSAL
by employing the SUM
aggregate operator.
The difference between FEa
and FEb
is the “moment” at which the department tuples are limited to attributes DEPTNO
and DNAME
. Specification FEa
performs tuple limitation in the expression before the bar (|)
; FEb
performs table projection on the table to which variable d
is bound (after the bar).
You should note the following:
NUM_EMP
as well as the SUM_MSAL
attributes have well-defined values for the empty departments. The cardinality of the empty set is well defined (it is zero), and we’ve defined the sum operator (see Definition 2-12) such that the sum of any expression over the empty set is equal to zero.Let’s now look at the SQL expressions SEI
, SEII
, and SEIII
. Here is a fine example of where the can of worms (mentioned in the book’s Introduction) is opened.
Specification SEI
follows the formal specification for this query; subqueries are used to compute the number of employees and the sum of the salaries. However, note that SQL’s count
operator is defined to be zero for the empty table, but SQL’s sum
operator is defined to return NULL
when performed over an empty table. To ensure that in the cases where the department employs no employees the SQL query will return 0
(zero) as the sum of the salaries, we have employed the dyadic NVL
function. The semantics of this function are as follows: if the first argument represents a value other than NULL
, then NVL
returns this first argument, otherwise NVL
returns the second argument.
Specification SEII
is often produced for these types of questions. You perform a join from DEPT
to EMP
, and by grouping the rows (GROUP BY
), you can compute the two aggregate attributes. However, note that this SQL query won’t return empty departments, and therefore doesn’t represent a correct SQL version for our original query. By the way, because this query only returns departments that employ at least one employee, the SUM
aggregation will never result in NULL
, and therefore you can discard the NVL
.
Now, if you are a proficient SQL programmer, you probably know how to fix this, right? Specification SEIII
represents the repaired version for SEII
; it employs the outer join to ensure that empty departments are returned too in the result set. The addition of (+)
behind e.DEPTNO
changes the semantics of the original join into an outer join from table DEPT
to table EMP
. Outer joins will also return a row for the “from” table when no matching row can be found in the “to” table; an all-NULL
s row will be generated and used to combine with the “from” table row.
You should be aware, though, that in the repaired query SEIII
, you must now change the count(*)
into—for instance—count(e.EMPNO)
to have SQL return the correct value of zero employees for the NUM_EMP
attribute (count(*)
would have returned one for empty departments). Yes, the can of worms is wide open now.
The next query retrieves for each department the employee who has the highest salary within that department. Note that it’s possible for two or more employees to have that same highest salary. Listing 9-8 gives for each department (DEPTNO
and DNAME
) the one or more employees (EMPNO
, ENAME
, and MSAL
) within that department who have the highest salary.
FEa: { d↓{DEPTNO,DNAME} ∪
{ (RICHEMP; { e1↓{EMPNO,ENAME,MSAL} | e1∈dbs(EMP) ∧ e1(DEPTNO)=d(DEPTNO) ∧
¬(∃e2∈dbs(EMP): e2(DEPTNO)=e1(DEPTNO) ∧
e2(MSAL)>e1(MSAL)) }
)
}
| d∈dbs(DEPT) }
FEb: dbs(DEPT)⇐{DEPTNO,DNAME}⊗
{ e1↓{EMPNO,ENAME,MSAL,DEPTNO} | e1∈dbs(EMP) ∧
¬(∃e2∈dbs(EMP): e2(DEPTNO)=e1(DEPTNO) ∧
e2(MSAL)>e1(MSAL)) }
FEc: dbs(DEPT)⇐{DEPTNO,DNAME}⊗
{ e1↓{EMPNO,ENAME,MSAL,DEPTNO} | e1∈dbs(EMP) ∧
(∀e2∈dbs(EMP): e2(DEPTNO)=e1(DEPTNO) ⇒
e2(MSAL)≤e1(MSAL)) }
SEI: select d.DEPTNO, d.DNAME
,cursor(select e1.EMPNO, e1.ENAME, e1.MSAL
from EMP e1
where e1.DEPTNO = d.DEPTNO
and not exists(select e2.*
from EMP e2
where e2.DEPTNO = e1.DEPTNO
and e2.MSAL > e1.MSAL)) as RICHEMP
from DEPT d
SEII: select d.DEPTNO, d.DNAME, e.EMPNO, e.ENAME, e.MSAL
from DEPT d
,(select e1.EMPNO,e1.ENAME,e1.MSAL,e1.DEPTNO
from EMP e1
where not exists(select e2.*
from EMP e2
where e2.DEPTNO = e1.DEPTNO
and e2.MSAL > e1.MSAL)) e
where d.DEPTNO = e.DEPTNO
Note the difference in structure of the result sets specified by FEa
and FEb
. Query FEa
returns a table over {DEPTNO,DNAME,RICHEMP}
; one tuple per department is returned (empty departments are returned too). Under attribute RICHEMP
, a nested table over {EMPNO,ENAME,MSAL}
is returned. Attribute RICHEMP
holds the set of employee tuples (limited to three attributes) that represent the employees with the highest salary in that department. Query FEb
returns a table over {DEPTNO,DNAME,EMPNO,ENAME,MSAL}
. This query doesn’t return empty departments. Also, if multiple employees all earn the same highest salary within a department, then the department data is repeated for these employees.
Let’s clarify this further by giving example tables for dbs(DEPT)
and dbs(EMP)
. Figure 9-1 shows projected versions of a department and an employee table in some given state dbs
of DB_UEX
.
Given these two tables, version FEa
for query EXQ8
would return the following set:
{ {(DEPTNO;10), (DNAME;'RESEARCH'),
(RICHEMP; { {(EMPNO;1002), (ENAME;'SUSAN'), (MSAL;5500)}
,{(EMPNO;1003), (ENAME;'BRITNEY'), (MSAL;5500)} })}
,{(DEPTNO;11), (DNAME;'SALES'), (RICHEMP; Ø)}
,{(DEPTNO;12), (DNAME;'MARKETING'),
(RICHEMP; { {(EMPNO;1005), (ENAME;'DEBBY'), (MSAL;6200)} })}
}
Version FEb
would return this set:
{ {(DEPTNO;10), (DNAME;'RESEARCH'), (EMPNO;1002), (ENAME;'SUSAN'), (MSAL;5500)}
,{(DEPTNO;10), (DNAME;'RESEARCH'), (EMPNO;1003), (ENAME;'BRITNEY'),(MSAL;5500)}
,{(DEPTNO;12), (DNAME;'MARKETING'),(EMPNO;1005), (ENAME;'DEBBY'), (MSAL;6200)}
}
Specification FEc
once more demonstrates the rewriting of a quantifier; it differs from FEb
in that the existential quantifier is rewritten into a universal quantifier.
SQL expression SEI
demonstrates that a specification such as FEa
can be translated directly into SQL. By using the keyword cursor
, you indicate that a subquery employed within the SELECT
clause can potentially return multiple rows. SQL expression SEII
is a direct translation for FEb
; it employs a subquery within the FROM
clause.
Note In Listing 9-9 that follows, we use a function to_year
in expressions FEa
and FEb
. This function is defined to return the year component of a given date value. In Oracle SQL, you can achieve the same through the to_char
function, as demonstrated in SEI
and SEII
.
Listing 9-9 gives the employees (EMPNO
and ENAME
) who attended at least one offering of every course that was given by trainer “De Haan” (employee number 1206
) in 2003.
FEa: {e↓{EMPNO,ENAME} | e∈dbs(EMP) ∧
{ o(COURSE) | o∈dbs(OFFR) ∧ o(TRAINER)=1206 ∧
o(STATUS)='CONF' ∧ to_year(o(STARTS))=2003 }
⊆
{ r(COURSE) | r∈dbs(REG)⊗dbs(OFFR) ∧
r(STUD)=e(EMPNO) ∧ r(TRAINER)=1206 ∧
r(STATUS)='CONF' ∧ to_year(r(STARTS))=2003 }
}
FEb: { e↓{EMPNO,ENAME} | e∈dbs(EMP) ∧
(∀o1∈{ o2↓{COURSE} | o2∈dbs(OFFR) ∧ o2(TRAINER)=1206 ∧
o2(STATUS)='CONF' ∧ to_year(o2(STARTS))=2003 }:
(∃r∈dbs(REG)⊗dbs(OFFR): r(COURSE)=o1(COURSE) ∧
r(STUD)=e(EMPNO) ∧ r(TRAINER)=1206 ∧
r(STATUS)='CONF' ∧ to_year(r(STARTS))=2003))
}
SEI: select e.EMPNO,e.ENAME
from EMP e
where 0 = (select count(*)
from (select o1.COURSE
from OFFR o1
where o1.TRAINER = 1206
and o1.STATUS = 'CONF'
and to_char(o1.STARTS,'YYYY') = '2003'
minus
select o2.COURSE
from OFFR o2
,REG r
where o2.COURSE = r.COURSE
and o2.STARTS = r.STARTS
and r.STUD = e.EMPNO
and o2.TRAINER = 1206
and o2.STATUS = 'CONF'
and to_char(o2.STARTS,'YYYY') = '2003'))
SEII: select e.EMPNO,e.ENAME
from EMP e
where not exists(select o1.*
from (select o.COURSE
from OFFR o
where o.TRAINER = 1206
and o.STATUS = 'CONF'
and to_char(o.STARTS,'YYYY') = '2003') o1
where not exists(select r.*
from REG r
,OFFR o2
where r.COURSE = o2.COURSE
and r.STARTS = o2.STARTS
and r.STUD = e.EMPNO
and o2.STATUS = 'CONF'
and to_char(r.STARTS,'YYYY') = '2003'
and r.TRAINER = 1206
and r.COURSE = o1.COURSE))
Formal specification FEa
queries all employees for which the set of distinct courses given by De Haan in 2003 is a subset of the set of courses for which an offering (taught by De Haan) was attended by the employee in 2003. Note that for an employee to have attended a course offering, the following must be true:
Specification FEb
is a rewrite of the subset operator in FEa
into quantifiers that follow from the definition of the subset operator. Set A
is a subset of B
if and only if every element of A
is an element of B
.
(A ⊆ B) ⇔ (∀a∈A: ∃b∈B: b=a)
The rewrite rule that we have applied is slightly more sophisticated; it is of the following structure:
{ a | a∈A ∧ P(a) } ⊆ { b | b∈B ∧ Q(b) }
⇔
(∀a1∈{ a2 | a2∈A ∧ P(a2) }: (∃b∈B: b=a1 ∧ Q(b)))
SQL does not support all set operators: union, intersection, and difference are available; symmetric difference is not, nor is the subset of operator. Fortunately, you can rewrite the subset operator using other available set operators:
(A ⊆ B) ⇔ ((A − B) = Ø)
Set A
is a subset of B
if and only if the difference A
minus B
represents the empty set. To check for an empty set in SQL, you would have to apply one more rewrite rule:
((A − B) = Ø) ⇔ (#(A − B) = 0)
Instead of checking that A
minus B
represents the empty set, you now check that the cardinality of A
minus B
equals zero. The preceding rewrite rules were applied to FEa
to end up with SQL expression SEI
.
Alternative SEII
represents the SQL version for the formal version where the subset operator was rewritten into quantifiers (FEb
). Note that (again) the universal quantifier had to be further rewritten into an existential quantifier.
Listing 9-10 gives courses (CODE
and DESCR
) for which all offerings (and there must be at least one) in 2006 were given by only one trainer, and for which none of these offerings was attended by another trainer.
FEa: {c↓{CODE,DESCR}
| c∈dbs(CRS) ∧ /* there is one trainer for all offerings of this course */
(∃e∈dbs(EMP):
(∀o∈dbs(OFFR): (o(COURSE)=c(CODE) ∧ o(STATUS)='CONF' ∧
to_year(o(STARTS))=2006) ⇒
o(TRAINER)=e(EMPNO) )
∧ /* course was offered in 2006 */
(∃o∈dbs(OFFR): o(COURSE)=c(CODE) ∧ o(STATUS)='CONF' ∧
to_year(o(STARTS))=2006)
∧ /* none of the offerings was attended by another trainer */
¬(∃r∈dbs(REG)⊗
(dbs(EMP)◊◊{(STUD;EMPNO),(JOB;JOB)})⊗dbs(OFFR):
r(COURSE)=c(CODE) ∧ r(STATUS)='CONF' ∧
to_year(r(STARTS))=2006 ∧ r(JOB)='TRAINER')
}
FEb: {c↓{CODE,DESCR}
| c∈dbs(CRS) ∧ /* there is one trainer for all offerings of this course */
1 = #{ o(TRAINER) | o∈dbs(OFFR) ∧ o(COURSE)=c(CODE) ∧
o(STATUS)='CONF' ∧ to_year(o(STARTS))=2006 }
∧ /* none of the offerings was attended by another trainer */
(∀r∈dbs(REG)⊗
(dbs(EMP)◊◊{(STUD;EMPNO),(JOB;JOB)})⊗dbs(OFFR):
(r(COURSE)=c(CODE) ∧ r(STATUS)='CONF' ∧
to_year(r(STARTS))=2006) ⇒ r(JOB)≥'TRAINER')
}
SEI: select c.CODE
,c.DESCR
from CRS c
where exists(select e.*
from EMP e
where not exists(select o2.*
from OFFR o2
where o2.COURSE = c.CODE
and o2.STATUS = 'CONF'
and to_char(o2.STARTS,'YYYY') = '2006'
and o2.TRAINER <> e.EMPNO))
and exists(select o1.*
from OFFR o1
where o1.COURSE = c.CODE
and o1.STATUS = 'CONF'
and to_char(o1.STARTS,'YYYY') = '2006')
and not exists(select r.*
from REG r
,EMP e
,OFFR o3
where r.STUD = e.EMPNO
and r.COURSE = o3.COURSE
and r.STARTS = o3.STARTS
and r.COURSE = c.CODE
and o3.STATUS = 'CONF'
and to_char(o3.STARTS,'YYYY') = '2006'
and e.JOB = 'TRAINER')
SEII: select c.CODE, c.DESCR
from COURSE c
where 1 = (select count(distinct o1.TRAINER)
from OFFR o1
where o1.COURSE = c.CODE
and o1.STATUS = 'CONF'
and to_char(o1.STARTS,'YYYY') = 2006)
and not exists(select r.*
from REG r
,EMP e
,OFFR o2
where r.STUD = e.EMPNO
and r.COURSE = o2.COURSE
and r.STARTS = o2.STARTS
and r.COURSE = c.CODE
and o2.STATUS = 'CONF'
and to_char(o2.STARTS,'YYYY') = '2006'
and e.JOB = 'TRAINER')
Note the additional conjunct (the one with comment /* course was offered in 2006 */
) inside the specification of FEa
to ensure that the universal quantification inside the first conjunct does not involve the empty set.
Specification FEb
takes care of this by stating that the cardinality of all distinct trainers, who have given an offering in 2006 for the course, is at least one. By stating that this cardinality should be exactly one, you specify that the course was given by no more than one trainer.
You can translate both formal specifications into SQL; FEa
requires a rewrite of the universal quantifier.
Listing 9-11 finds all departments that violate the employee table constraint stating that departments that employ the president or a manager should also employ at least one administrator.
FEa: { d | d∈dbs(DEPT) ∧
(∃e1∈dbs(EMP): e1(DEPTNO)=d(DEPTNO) ∧
e1(JOB)∈{'PRESIDENT','MANAGER'}) ∧
¬(∃e2∈E: e2(DEPTNO)=d ∧ e2(JOB) = 'ADMIN' ) }
FEb: { d | d∈dbs(DEPT) ∧
(∃e1∈dbs(EMP): e1(DEPTNO)=d(DEPTNO) ∧
e1(JOB)∈{'PRESIDENT','MANAGER'}) ∧
(∀e2∈dbs(EMP): e2(JOB)='ADMIN' ⇒ e2(DEPTNO)≠d(DEPTNO)) }
SEI: select d.*
from DEPT d
where exists(select e1.*
from EMP e1
where e1.DEPTNO = d.DEPTNO
and e1.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e1.JOB = 'ADMIN')
Listing 7-26 introduced the formal specification for this table constraint. Using this specification, you can derive the formal expression for query EXQ11
. Because you want to find departments that violate this constraint, all you have to do is negate the constraint specification; just prefix the specification with ¬.
Here it is (note that in this specification, free variable E
represents an employee table):
¬(∀d∈{ e1(DEPTNO) | e1∈E }:
(∃e2∈E: e2(DEPTNO)=d ∧ e2(JOB)∈{'PRESIDENT','MANAGER'} )
⇒
(∃e3 ∈E: e3(DEPTNO)=d ∧ e3(JOB)='ADMIN' ) )
If you then rewrite this specification by bringing the negation into the quantification, you end up with the following specification (verify this for yourself):
(∃d∈{ e1(DEPTNO) | e1∈E }:
(∃e2∈E: e2(DEPTNO)=d ∧ e2(JOB)∈{'PRESIDENT','MANAGER'} ) ∧
¬(∃e3∈E: e3(DEPTNO)=d ∧ e3(JOB)='ADMIN' ) )
Formal expression FEa
now follows straightforwardly from the preceding rewritten constraint specification. FEb
is a rewrite of FEa
; a universal quantifier replaces the negated existential quantifier.
You can easily derive the SQL text (SEI
) from specification FEa
.
You might have noticed that because SQL lacks universal quantification and implication, certain SQL expressions end up having more negations than their formally specified counterpart did. SEII
of EXQ9
is a fine example of this; it contains a nested negated existential quantifier. It is difficult to understand the meaning of such a query. The formal counterpart that employs the universal quantifier is easier to understand.
In general we (humans) have difficulty understanding expressions that contain negations; the more negations, the more difficult it gets for us. Whenever you encounter negations in formal specifications, you should investigate the opportunity of rewriting such a specification into an expression holding fewer negations. You can use the rewrite rules introduced in Part 1 of this book to achieve this.
Note By rewriting formal specifications such that they hold fewer negations, you not only help yourself; you also enable clearer informal specifications that your customer (the user) will definitely appreciate.
Because they are so important, we list rewrite rules that involve negations once again here. Table 9-1 contains various alternatives for the De Morgan, implication, and quantifier rewrite rules.
Expression | Is Equivalent To | |
¬¬A |
⇔ | A |
¬A ⇒ ¬B |
⇔ | B ⇒ A |
¬A ⇒ B |
⇔ | ¬B ⇒ A |
A ⇒ ¬B |
⇔ | B ⇒ ¬A |
A ⇒ ¬B |
⇔ | (A ∧ B) ⇒ FALSE |
¬A ∧ ¬B |
⇔ | ¬( A ∨ B ) |
A ∧ ¬B |
⇔ | ¬( ¬A ∨ B ) |
¬A ∧ B |
⇔ | ¬( A ∨ ¬B ) |
¬A ∨ ¬B |
⇔ | ¬( A ∧ B ) |
A ∨ ¬B |
⇔ | ¬( ¬A ∧ B ) |
¬A ∨ B |
⇔ | ¬( A ∧ ¬B ) |
¬( ∃x ∈X: ¬P(x)) |
⇔ | ( ∀x ∈X: P(x)) |
( ∃x ∈X: ¬P(x)) |
⇔ | ¬( ∀x ∈X: P(x)) |
¬( ∃x ∈X: P(x)) |
⇔ | ( ∀x ∈X: ¬P(x)) |
( ∃x ∈X: P(x)) |
⇔ | ¬( ∀x ∈X: ¬P(x)) |
Sometimes you can get rid of a negation by using knowledge about the constraints of a database design. For instance, if you need to query all offerings that are not canceled, you can equivalently query all offerings that are either scheduled or confirmed. If o
represents an offering tuple, then the following rewrite rule will apply:
o(STATUS) ≠ 'CANC' ⇔ o(STATUS)∈{'SCHD','CONF'}
Here you use knowledge about the attribute value set of attribute STATUS
.
If—in the course of rewriting expressions—you end up querying employees who don’t have a salary of less than 5000
, it might be better to query the employees who have a salary equal to or more than 5000
instead. The following rewrite rule applies (e
represents an employee tuple):
¬(e(MSAL) < 5000) ⇔ e(MSAL) ≥ 5000
Our brains dislike negations. Try to avoid negations by rewriting expressions into expressions containing fewer negations. This is not only true for query specifications, but also for constraint specifications (in previous chapters), and data manipulation specifications (in the next chapter).
This section provides a summary of this chapter, formatted as a bulleted list. You can use it to check your understanding of the various concepts introduced in this chapter before continuing with the exercises in the next section.
distinct
keyword.Develop both formal and SQL expressions for these queries.
Note In the following exercises, there may be queries that cannot be answered with the given database. In these cases, you should give arguments why the answer cannot be given.
10
.10
, 11
, 12
, or 15
.10
and 11
.10
.35
who have a monthly salary of more than 2000
.COURSE
is uniquely identifying in OFFR
and “no” if otherwise. Does the answer “yes” imply that COURSE
is a key of OFFR
?10
in 2006.10
at the end of 2006.DB1
, DB2
, DB3
, the duration, and for every offering (with a begin date) in 2006: begin date, status, and number of registered students.DEPTNO
and DNAME
) the number of administrators, trainers, sales representatives, and managers.500
dollars.EMPNO
and ENAME
) who attended (at least) one offering for all design courses (course category equals 'DSG'
) in 2006.18.226.181.57