images

Note from the author: Much of the text in this appendix was originally part of Chapter 3—the last chapter that Lex had been able to spend time on at the end of 2005. Being called “Mr. NULL,” as some of you might know, he produced quite a bit of text on the topic of NULLs and 3-valued logic (3VL) as part of Chapter 3. By the time Chapter 3 review comments were in and I (Toon) started working on them, I decided that it would be better, for Chapter 3, to move all text with regards to NULLs and 3VL out of Chapter 3 and into this appendix.

During his career, my coauthor dedicated quite a bit of time to investigating the phenomenon of a NULL “value” in SQL DBMSes and to researching the 3 (or more) valued logic that arises from it. I just used NULLs (cautiously) in my job as a database professional. Because it was difficult to integrate my perspective on this subject into Lex’s text, I chose to split this appendix into two sections.

In the first section (“To Be Applicable or Not”), I’ll give you my thoughts on NULLs in SQL DBMSes from a database design point of view.

The second section, “Three (or More) Valued Logic,” consists of a “cut and paste” of Lex’s text on 3VL from the original version of Chapter 3. I did incorporate the review comments (from Jonathan Gennick and Chris Date), which Lex never had the time to study, into his text.

To Be Applicable or Not

As you saw in Chapter 11, the example database design introduced in this book hardly contains any nullable attributes; all attributes in all table structures, with the exception of TRAINER in OFFR, and COMMENTS in TERM, are mandatory (NOT NULL, in SQL terminology). You’ll probably find a lot more nullable attributes in the database designs that you deal with in your daily job, right?

Most likely the main reason why an attribute is left optional is the inapplicability of the value that could have been stored in it for certain rows. Let me explain this. Optional attributes are usually introduced in database designs for one of the following three reasons:

  • The data values that could have been stored are inapplicable.
  • The data values are not yet applicable.
  • The values are nice to know (if known), but not mandatory.

The next three sections will briefly discuss these reasons.

Inapplicable

Inapplicable data values arise from the failure to design specializations. This is the main reason of the manifestation of optional attributes in the implementation of a database design. By not introducing separate table structures to hold the attributes that are only applicable for a certain subset of the tuples in a given table structure, you’ll end up with optional attributes.

In the example database design in Chapter 7, we specifically introduced SREP and MEMP to prevent the following optional attributes in EMP:

  • TARGET and COMM: Only applicable for (the subset of) sales representatives
  • MGR: Only applicable for managed employees; that is, everybody but the president

Frequently, you’ll find that neither SREP nor MEMP are introduced, and that attributes TARGET, COMM, and MGR are nullable attributes in the EMP table structure. With a bit of luck, all the following four CHECK constraints accompany them:

CHECK( JOB<>'SALESREP' or (TARGET is not null and COMM is not null) )
CHECK( JOB='SALESREP' or (TARGET is null and COMM is null) )
CHECK( JOB='PRESIDENT' or MGR is not null )
CHECK( JOB<>'PRESIDENT' or MGR is null )

The first two constraints represent the following two implications, specifying when the TARGET and COMM attributes should hold a value and when not:

( JOB = 'SALESREP' ) ⇒ ( TARGET is not null and COMM is not null )
( JOB <> 'SALESREP' ) ⇒ ( TARGET is null and COMM is null )

Note that the implications have been rewritten into disjunctions.

The last two CHECK constraints represent the implications specifying when the MGR attribute should hold a value and when not:

( JOB <> 'PRESIDENT' ) ⇒ ( MGR is not null )
( JOB = 'PRESIDENT' ) ⇒ ( MGR is null )

imagesNote Often you’ll find that the second and fourth CHECK constraints are erroneously left unspecified.

The preceding four CHECK constraints would implement the applicability requirements of these specialization attributes, given that they are part of the EMP table structure. They are the counterparts of database constraints PSPEC1 and PSPEC2 found in the example database design.

Not Yet Applicable

Not yet applicable data values also arise from failing to design specializations. As we mentioned in Chapter 7, the example database design holds two rather tricky hacks. The attribute-value sets of the TRAINER attribute in the OFFR table structure, and the EVAL attribute in the REG table structure, hold a special -1 value. In both cases, this value is used to signify that the attribute value is not yet applicable.

Apparently there are two sorts of offerings: offerings with a trainer assigned to them and offerings without one assigned to them. The following tuple constraint, taken from the example database universe specification, describes when a trainer needn’t be assigned yet (here o represents an offering tuple):

/* Unassigned TRAINER allowed only for canceled and scheduled offerings */
o(TRAINER) = -1 ⇒ o(STATUS) ∈ {'CANC','SCHD'}

Note that this tuple constraint specifies that confirmed offerings must have an assigned trainer, and canceled or scheduled offerings may have an assigned trainer.

We can make a similar remark about registrations: some registrations include an evaluation score for the course offering, and some of them don’t yet. In this case, no constraint describes exactly when registrations hold an evaluation and when not. The table constraint specified in the table universe tab_REG does state that whenever one registration of an offering holds an evaluation, then all registrations of that offering should hold an evaluation. Here’s the specification of that constraint (R represents a registration table):

/* Offering is evaluated by all attendees, or it is too early to */
/* evaluate the offering */
( ∀r1,r2∈R:
  ( r1↓{COURSE,STARTS} = r2↓{COURSE,STARTS} )
    ⇒
  ( ( r1(EVAL) = -1 ∧ r2(EVAL) = -1 ) ∨
    ( r1(EVAL) ≠ -1 ∧ r2(EVAL) ≠ -1 )
) )

In the actual implementation of these kinds of designs, you’ll often see that the special -1 value from the formal specification has been transformed to a NULL in an SQL DBMS.

However, in a properly designed database, we should have created a specialization for both the offering table design and the registration table design such that each holds the attribute that is not always applicable in the current database design (TRAINER for offering and EVAL for registration). The attribute-value sets of these two attributes then do not hold the special -1 value, and in the implementation these specialization attributes will have become mandatory attributes. In case of the offering specialization, you should then also specify an appropriate database constraint replacing the tuple constraint that currently describes when a trainer must be assigned.

Nice to Know

The third reason why optional attributes are introduced in database designs is a subtle variation of the preceding reason. When optional attributes are introduced due to the second reason, you’ll typically find one or more data integrity constraints at the tuple, table, or database level that precisely describe when the attribute value is indeed applicable. In the absence of any such constraint, we regard the value of an optional attribute as “nice to know.”

In the implementation of the example database design, the COMMENTS attribute of the TERM table structure is an example of this category. The users apparently just allow a missing value for this attribute.

Implementation Guidelines

If an optional attribute is of the category “nice to know” (and thus not involved in any constraints), then chances are that this attribute is also not involved in any business logic. It’s probably an attribute that is predominantly used in reports, just to be printed on paper, or displayed on screen. As such, the NULLs stored in the column seldom give rise to the various issues surrounding the use of NULLs described in the next section. It is more convenient just to deal with the optional attribute, instead of creating a separate specialization table that holds the attribute as a mandatory attribute, and which would require the use of outer joins to always try to retrieve the value.

imagesNote If you’re unfamiliar with the concept of an outer join, we refer you to the Oracle documentation.

It is our opinion that inapplicable data values—the ones introduced due to the first reason—should be avoided at all times. Our main argument for this (besides the avoidance of the issues described in the next section) is that introducing specializations for these attributes benefits the clarity of the overall database design. This is particularly the case when there is not just a single attribute, but a whole group of attributes that would be transferred to one or more specialization tables.

The cases of not yet applicable data values (the second reason) often give rise to heated debates as to how these should be dealt with. From a true relational point of view, you again would have to avoid them by creating specializations accompanied with the relevant constraints describing when tuples should be present in the specializations. If you decide not to, then you should not fail both to specify and implement the accompanying constraints that describe when the values are indeed applicable and when not.

Three (or More) Valued Logic

In this section, Lex briefly explores 3VL. He introduces you to the three-valued truth tables for the three connectives AND, OR, and NOT. He further investigates some problems around the implication connective in 3VL, and various other issues that arise in 3VL.

Unknown

The most crucial assumption in 3VL is that (besides the two values TRUE and FALSE) there is a third value to represent “possible” or (still) UNKNOWN. To avoid misunderstandings, let’s start with the following statement: there is nothing wrong with 3-valued (or multi-valued) logic. Many mathematicians have explored this area in the past; one of them worth mentioning is Jan Łukasiewicz (1878–1956).

imagesNote Łukasiewicz also introduced the Polish notation, which allows expressions to be written unambiguously without the use of brackets. This is the basis of the Reverse Polish Notation (RPN), which many pocket calculators and expression compilers use.

In 3VL, predicates can change their truth values in time, but only from UNKNOWN to TRUE or from UNKNOWN to FALSE. This is rather slippery, because it means that in 3VL the value UNKNOWN seems to be somewhat special. This description is too vague anyway; we need formal rules, definitions, and truth tables. We’ll look at those details in a minute.

3VL is counterintuitive, as opposed to the classical 2-valued logic (2VL). The main reason is that you lose the tertium non datur (the principle of the excluded middle). That is, if P is unknown, then ¬P is unknown as well, which in turn means that P ∨ ¬P is unknown. Consider the following famous Aristotelian example. “Tomorrow a sea battle will take place” is neither TRUE nor FALSE—it is unknown—yet the sentence “Either tomorrow there will be a sea battle, or tomorrow there won’t be a sea battle” is certainly true (in reality). However, it is unknown in 3VL, hence the counterintuitive nature.

Another problem arises from the fact that if two predicates are “possible” (that is, their truth value could possibly be TRUE), then their conjunction is also “possible.” However, this is obviously wrong if one of them is a negation of the second. Ludwik Stefan Borkowski (1914–1993) proposed to “fix” this problem by introducing a 4-valued logic (4VL); this indeed makes the preceding problem disappear, but the solution is unsatisfactory and ad hoc.

Furthermore, 3VL does not solve the problem of handling missing information; it does not allow you to represent the reason why information is missing. At the end of this section, you’ll see that Ted Codd proposed a 4VL in 1990, to distinguish between applicable and inapplicable values. However, a 4VL doesn’t tell you what to do if you don’t know whether a certain attribute value is applicable or not; for example, if commission is applicable for sales reps only, but you don’t know the job of a certain employee, what should you do with the commission attribute of that employee? One NULL implies 3VL, two NULLs (applicable and inapplicable) imply 4VL, nNULLs imply (n+2)-valued logic. Using the technique of full induction, you can prove that you’ll end up with an infinite number of (meanings of) NULLs, and a corresponding infinite number of truth values. To repeat part of this paragraph’s opening sentence: 3VL does not solve the problem of handling missing information.

Even worse, 3VL is not only counterintuitive in itself; it is ill-implemented in the SQL standard. Probably one of the biggest blunders in the SQL standard in this respect is that in an attribute of the BOOLEAN data type, a NULL represents the logical value UNKNOWN. Wasn’t a NULL supposed to represent the fact that you don’t know the value? There are more examples of such mistakes in the SQL standard.

You’ll have to revisit all tautologies (and therefore all rewrite rules) from 2VL to check whether they’re still valid in 3VL; you’ll find out that several tautologies don’t hold anymore.

Actually, you could even argue that we should use different names and symbols for the connectives in 3VL; after all, they are different from the “corresponding” connectives in 2VL.

Truth Tables of Three-Valued Logic

Tables D-1 through D-3 show the three standard truth tables for the negation, conjunction, and disjunction connectives in 3VL.

Table D-1. Three-Valued Truth Table for NOT (Negation)

P ¬P
T F
U U
F T

¬P is still FALSE if P is TRUE, and vice versa. However, the negation is not the complement operator anymore, because the negation of UNKNOWN is UNKNOWN.

Table D-2. Three-Valued Truth Table for AND (Conjunction)

P Q P ∧Q
T T T
T U U
T F F
F T F
F U F
F F F
U T U
U U U
U F F

As you can see, PQ is only TRUE if both P and Q are TRUE. The result is FALSE if at least one of the operands is FALSE, and the three remaining combinations result in UNKNOWN.

Table D-3. Three-Valued Truth Table for OR (Disjunction)

P Q P ∨Q
T T T
T U T
T F T
F T T
F U U
F F F
U T T
U U U
U F U

P ∨ Q is only FALSE if both P and Q are FALSE. The result is TRUE if at least one of the operands is TRUE, and there are again three combinations with result UNKNOWN.

Missing Operators

You almost never see a truth table for the implication and the equivalence in 3VL. The most likely reason for the absence of the implication is that it is tough to come up with a good definition. It turns out that in 3VL, the implication connective is not definable in terms of conjunction and negation, or disjunction and negation. Remember the following implication rewrite rule from 2VL:

(P ⇒ Q) ⇔ (¬P ∨ Q )

If you’d define the implication in 3VL according to this rewrite rule, you’d end up in the situation that PP would evaluate to UNKNOWN if P is UNKNOWN; obviously, in any logic you would like this to be TRUE. The last row in Table D-4 shows the problem.

Table D-4. An attempt to Define Three-Valued Implication and Equivalence Connectives

P Q P ⇒ Q P ⇔ Q
T T T T
T F F F
T U U U
F T T F
F F T T
F U T U
U T T U
U F U U
U U U? U?

Also, if P ⇒ Q and Q ⇒ P, you want P to be equivalent with Q. These problems are all related to the fact that the negation is not the complement operator anymore. Actually, Jan Łukasiewicz proposed a truth table for the implication where the two U? values in the last row are both a T instead—so he “fixed” the problem by sacrificing a tautology (the implication rewrite rule).

Three-Valued Logic, Tautologies, and Rewrite Rules

If you want to use any of the rewrite rules we identified in 2VL, you’ll have to check them first against the truth tables of 3VL; as you’ll find out, several tautologies and rewrite rules from 2VL are not necessarily also valid in 3VL. Listing D-1 shows two examples of such tautologies.

Listing D-1. Examples of 2VL Tautologies That Are Not Valid in 3VL

P ∨ ¬P
P ⇔ P

Handling Three-Valued Logic

Consider the following two rather obvious statements (in an SQL context):

  • A row shows up in a table (or in the result of a query) or it doesn’t.
  • A constraint gets violated or it doesn’t.

There are no other possibilities; therefore, these are two-valued issues. However, in 3VL, predicates have three possible outcomes: TRUE, FALSE, or UNKNOWN. Unfortunately, there is no intuitive interpretation for this third possible outcome. To give you an idea of the inconsistencies, look at Table D-5 where you see the difference between evaluating the predicate, say P, of a WHERE clause of an SQL query compared with checking the predicate of an SQL CHECK constraint.

Table D-5. Predicate Evaluation in SQL:Queries vs. Constraints

P(row) WHERE P(row) CHECK(P(row))
T Row accepted Row satisfies constraint
F Row rejected Row violates constraint
U Row rejected Row satisfies constraint

As you can see, FALSE and UNKNOWN lead to the same result for WHERE clauses in queries, whereas TRUE and UNKNOWN lead to the same effect in constraints. The third line in the preceding table shows the discrepancy; a row will violate the predicate (and therefore not be retrieved) when the predicate is part of a WHERE clause. However, that row will satisfy the predicate when evaluated by a CHECK constraint that is defined on a table that receives this row.

If you want to get control over 3VL in ISO-standard SQL, it is a good idea to think in terms of the three SQL functions introduced in Table D-6. They allow you to create a 2VL layer on top of 3VL, thus providing more explicit control over the three Boolean values TRUE, FALSE, and UNKNOWN.

Table D-6. The IS {TRUEFALSEUNKNOWN} Operators in SQL

P IS TRUE(P) IS FALSE(P) IS UNKNOWN(P)
T T F F
F F T F
U F F T

Note that UNKNOWN does not appear in the last three columns of Table D-6. Therefore, these three operators enable you to map three-valued expressions to 2VL.

Listing D-2 shows some rewrite rules based on the operators defined in Table D-6.

Listing D-2. Some Rewrite Rules in 4VL Using the IS Operators

IS TRUE ( P ) ⇔ ( P ∧ ¬ ( IS UNKNOWN (P) ) )
IS FALSE ( P ) ⇔ ( ¬P ∧ ¬ ( IS UNKNOWN (P) ) )
IS TRUE  ( ¬P ) ⇔ ( IS FALSE (P) )
IS FALSE ( ¬P ) ⇔ ( IS TRUE (P) )
IS TRUE ( P ∧ Q ) ⇔ ( IS TRUE (P) ∧ IS TRUE (Q) )
IS TRUE ( P ∨ Q ) ⇔ ( IS TRUE (P) ∨ IS TRUE (Q) )
IS FALSE ( P ∧ Q ) ⇔ ( IS FALSE (P) ∨ IS FALSE (Q) )
IS FALSE ( P ∨ Q ) ⇔ ( IS FALSE (P) ∧ IS FALSE (Q) )
IS TRUE ( ∃x∈S: P ) ⇔ ( ∃x∈S: (IS TRUE (P) ) )
IS FALSE ( ∃x∈S: P ) ⇔ ( ∀x∈S: (IS FALSE (P) ) )
IS FALSE ( ∀x∈S: P ) ⇔ ( ∃x∈S: ( IS FALSE (P) ) )
IS UNKNOWN ( ∃x∈S: P ) ⇔ ( ¬∃x∈S: ( IS TRUE (P) ) ∧ ∃y∈S: (IS UNKNOWN (P) ) )
IS UNKNOWN ( ∀x∈S: P ) ⇔ ( ¬∃x∈S: ( IS FALSE (P) ) ∧ ∃y∈S: (IS UNKNOWN (P) ) )

You can prove all these tautologies by using three-valued truth tables, or by using a combination of 3VL rewrite rules you proved before.

Four-Valued Logic

In E. F. Codd’s The Relational Model for Database Management: Version 2 (Addison-Wesley, 1990), he proposes a revision of the first version of the relational model, RM/V1. Earlier, in 1979, he presented a paper in Tasmania with the title “Extending the Database Relational Model to Capture More Meaning,” naming the extended version RM/T (T for Tasmania). The features of RM/T were supposed to be gradually incorporated into the sequence of versions RM/V2, RM/V3, and so on.

The most debatable sections of the RM/V2 book are Chapter 8 (“Missing Information”) and Section 12.4 (“Manipulation of Missing Information”). In these sections, Codd proposes a 4VL in an attempt to make a distinction between the two most common reasons why information is missing: applicable and inapplicable, represented with the two discernible values A and I, respectively. The truth tables he provides appear in Table D-7.

Table D-7. Four-Valued Truth Tables for NOT, OR, and AND

images

Note that the last two truth tables use a slightly different format from all other truth tables you saw so far in this book. They are formatted as a matrix where the four rows represent the four possible truth values for P, and the four columns represent the four possible values for Q, respectively. This can be done because the disjunction and conjunction connectives are commutative; in the regular truth table format, those two truth tables would have needed sixteen rows each.

Note that introducing the two discernible values A and I also implies that you have to revisit the outcome of arithmetic operators and string concatenation; Ted Codd proposed the behavior shown in Listing D-3, where x denotes a regular numeric attribute value and s denotes a regular character string value.

Listing D-3. Arithmetic Operators and Concatenation Revisited in 4VL

a + a = a    a || a = a
i + i = i    i || i = i
x + a = a    s || a = a
a + i = i    a || i = i
x + i = i    s || i = i

As stated before, the whole idea of introducing two discernible values (a and i) to replace the NULL (and the corresponding 4VL) must be considered as one of Ted Codd’s few mistakes, in hindsight. But it is always easy to criticize afterwards; the importance of Ted Codd’s original paper cannot be overestimated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.156