CHAPTER 1

1.1 Here are a few examples of statements from the early part of the chapter in which every occurrence of the term relation (highlighted here in bold) should be replaced by the term relvar:

  • “[Every] relation has at least one candidate key.”

  • “[A] foreign key is a combination, or set, of attributes FK in some relation r2 such that each FK value is required to be equal to some value of some key K in some relation r1 (r1and r2 not necessarily distinct).”

  • “[The] relational assignment operator ... allows the value of some relational expression ... to be assigned to some relation.”

  • “A view (also known as a virtual relation) is a named relation whose value at any given time t is the result of evaluating a certain relational expression at that time t.”

And so on.

1.2 E. F. Codd (1923–2003) was the inventor of the relational model, among many other things. In December 2003 I published a brief tribute to him and his achievements, which you can find on the ACM SIGMOD website www.acm.org/sigmod and elsewhere. Note: An expanded version of that tribute appears in my book Date on Database: Writings 2000–2006 (Apress, 2006).

1.3 A domain can be thought of as a conceptual pool of values from which actual attributes in actual relations take their actual values. In other words, a domain is a type, and the terms domain and type are effectively interchangeable—but personally I much prefer type, as having a longer pedigree (in the computing world, at least), as well as being slightly more succinct. Domain is the term used in most of the older database literature, however. Note: Don’t confuse domains as understood in the relational world with the construct of the same name in SQL, which can be regarded at best as a very weak kind of type (see Chapter 2—in particular, the answer to Exercise 2.1 later in this appendix). Also, be aware that some older writings (including certain very early ones by myself) unfortunately and mistakenly use the term domain when what they really mean is attribute. Be on your lookout for confusion in this area.

1.4 A database satisfies the referential integrity rule if and only if for every tuple containing a reference (i.e., a foreign key value) there exists a referent (i.e., a tuple in the pertinent “target” relvar with that same value as a value for the pertinent target key). Loosely: If B references A, then A must exist. See Chapter 5 and Chapter 8 for further discussion.

1.5 Let R be a relvar. Then every relation r that can legally be assigned to R must have the same heading, and hence a fortiori the same attributes and the same degree (see Chapter 2 and Chapter 3 for further discussion); and the heading, attributes, and degree of R are, respectively, the heading, attributes, and degree of every such relation r. They can therefore (and in practice always are) specified as part of the definition of R.

Now let the relation that’s assigned to R at some particular time t be r. Then the body, tuples, and cardinality of R at that time t are, respectively, the body, tuples, and cardinality of r. Note that the body, tuples, and cardinality of a relvar vary over time, while the heading, attributes, and degree don’t.

By the way, it follows from the foregoing that if we use SQL’s ALTER TABLE to add a column to or drop a column from some base table T, then the effect, logically speaking, is to replace that table T by some distinct table T′ (the term table being, in such contexts, SQL’s counterpart to the relational term relvar). T′ is not “the same table as before”—speaking purely from a logical point of view, that is. But it’s convenient to overlook this nicety in informal contexts.

1.6 See the section MODEL vs. IMPLEMENTATION in the body of the chapter.

1.7 (a) Physical data independence is the independence of users and application programs from the way the data is physically stored and accessed. It’s a logical consequence of keeping a rigid separation between the model and its implementation. To the extent that such separation is observed, and hence to the extent that physical data independence is achieved, we have the freedom to make changes to the way the data is physically stored and accessed—typically for performance reasons—without at the same time having to make corresponding changes in queries and application programs. Such independence is desirable because it translates into protecting investment in training, applications, and logical database designs.

(b) The model is the abstract machine with which users interact; the implementation is the realization of that abstract machine on some physical computer system. Users have to understand the model, since it defines the interface they have to deal with; they don’t have to understand the implementation, because that’s under the covers (at least, it should be). The following analogy might help: In order to drive a car, you don’t have to know what goes on under the hood—all you have to know is how to steer, how to shift gear, and so on. So the rules for steering, shifting, and the rest are the model, and what’s under the hood is the implementation. (It’s true that you might drive better if you do have some understanding of what goes on under the hood, but you don’t have to know. Analogously, you might use a data model better if you have some knowledge of how it’s implemented—but ideally, at least, you shouldn’t have to know.) Note: The term architecture is sometimes used with a meaning very similar to that of model as defined here.

1.8 Rows in tables are ordered top to bottom but tuples in relations aren’t; columns in tables are ordered left to right but attributes in relations aren’t; tables might have duplicate rows but relations never have duplicate tuples. Also, relations contain values, but tabular pictures don’t (they don’t even contain “occurrences” or “appearances” of such values); rather, they contain symbols that denote such appearances—for example, the symbol (or numeral) 5, which denotes an appearance of the value five. See the answer to Exercise 3.5 later in this appendix for several further differences.

1.9 No answer provided.

1.10 Throughout this book I use the term relational model to mean the abstract machine originally defined by Codd (though that abstract machine has been refined, clarified, and extended somewhat since Codd’s original vision). I don’t use the term to mean just a relational design for some particular database. There are lots of relational models in the latter sense but only one in the former sense. (As noted in the body of the chapter, you can find quite a lot more on this issue in Appendix A.)

1.11 Here are some:

  • The relational model has nothing to say about “stored relations” at all; in particular, it categorically doesn’t say which relations are stored and which not. In fact, it doesn’t even say that relations as such have to be stored—there might be a better way to do it (and indeed there is, though the specifics are beyond the scope of this book).

  • Even if we agree that the term “stored relation” might make some kind of sense—meaning a user visible relation that’s represented in storage in some direct and efficient manner, without getting too specific on just what direct and efficient might mean—which relations are “stored” should be of no significance whatsoever at the relational (i.e., user) level of the system. In particular, the relational model categorically does not say that “tables” (meaning, more specifically, base tables, or rather base relations) are stored and views aren’t.

  • The extract quoted doesn’t mention the crucial logical difference between relations and relvars.

  • The extract also seems to assume that table and base table are interchangeable terms (and concepts)—a very serious error, in my opinion.

  • The extract also seems to distinguish between tables and relations (and/or relvars). If “table” means, specifically, an SQL table, then I certainly agree there are some important distinctions to be observed, but they’re not the ones the extract seems to be interested in.

  • “[It’s] important to make a distinction between stored relations ... and virtual relations”: Actually, it’s extremely important from the user’s perspective (and from the perspective of the relational model, come to that) not to make any such distinction at all!

1.12 Here are a few things that are wrong with it:

  • The relational model as such doesn’t “define tables” at all, in the sense meant by the extract quoted. It doesn’t even “define” relations (or relvars, rather). Instead, such definitions are supplied by some user. And anyway: What’s a “simple” table? Are there any complex ones?

  • What on earth does the phrase “each relation and many to many relationships” mean? What does it mean to “define tables” for such things?

  • The following concepts aren’t part of the model, so far as I know: entities, relationships between entities, linking tables, “cross-reference keys.” (It’s true that Codd’s original model had a rule called “entity integrity,” but that name was only a name, and I reject that rule in any case.) It’s also true that it’s possible to put some charitable interpretations on all of these terms, but the statements that result from such interpretations are usually wrong. For example, relations don’t always represent “entities” (what “entity” is represented by the relation that’s the projection of suppliers on {STATUS,CITY}?).

  • Primary and secondary indexes and rapid access to data are all implementation notions—they’re nothing to do with the model. In particular, primary (or, more generally, candidate) keys shouldn’t be equated with “primary indexes.”

  • “Based upon qualifications”? Would it be possible to be a little more precise? It’s truly distressing, in the relational context above all others (where precision of thought and articulation was always a key objective), to find such dreadfully sloppy phrasing. Well, yeah, you know, a relation is kind of like a table, or a kind of a table, or something ... if you see what I mean.

  • Finally, what about the operators? It’s an all too common error to think the relational model has to do with structure only and to forget about the operators. But the operators are crucial! As Codd himself once observed: “Structure without operators is ... like anatomy without physiology.”

As a kind of postscript to the foregoing, I remark that the relational model certainly seems to have received more than its fair share of misunderstanding or misrepresentation in the literature over the years. Here are a few more quotes to illustrate the point:

  • Relational model: A scheme for defining databases in which data elements are organized into relations, typically viewed as rows in tables.” As I wrote when I first had occasion to comment on this “definition” (which appears in a book on object technology): Never mind the inaccuracies—you mean that’s it? What about the operators? What about integrity? What about declarative query? What about views? What about the model’s set level nature? What about optimization? And so on and so forth.

  • “A newer form of database manager, the relational model, ... [removes] information about complex relationships from the database ... Although the relational model is much more flexible than its predecessors, it pays a price for this flexibility. The information about complex relationships that was removed from the database must be expressed as procedures in every program that accesses the database, a clear violation of the independence required for modularity.” I’ll leave comments on this one to you (it’s taken from that same book on object technology).

  • “Consider a data relationship in which a part can have multiple suppliers and vice versa ... There are two base tables: a part table and a supplier table. Then there is a cross-reference table from part to supplier and another cross-reference table from supplier to part” (emphasis added). This quote is from what has to be one of the worst textbooks I’ve ever read.

1.13 Here are some possible CREATE TABLE statements. Regarding the column data types, see Chapter 2. Note: These CREATE TABLE statements, along with their Tutorial D counterparts, are repeated in Chapter 5, where further pertinent discussion can be found. See also the answer to Exercise 2.15 later in this appendix.

     CREATE TABLE S
       ( SNO    VARCHAR(5)   NOT NULL ,
         SNAME  VARCHAR(25)  NOT NULL ,
         STATUS INTEGER      NOT NULL ,
         CITY   VARCHAR(20)  NOT NULL ,
         UNIQUE ( SNO ) ) ;

     CREATE TABLE P
       ( PNO    VARCHAR(6)   NOT NULL ,
         PNAME  VARCHAR(25)  NOT NULL ,
         COLOR  CHAR(10)     NOT NULL ,
         WEIGHT NUMERIC(5,1) NOT NULL ,
         CITY   VARCHAR(20)  NOT NULL ,
         UNIQUE ( PNO ) ) ;

     CREATE TABLE SP
       ( SNO    VARCHAR(5)   NOT NULL ,
         PNO    VARCHAR(6)   NOT NULL ,
         QTY    INTEGER      NOT NULL ,
         UNIQUE ( SNO , PNO ) ,
         FOREIGN KEY ( SNO ) REFERENCES S ( SNO ) ,
         FOREIGN KEY ( PNO ) REFERENCES P ( PNO ) ) ;

Note that SQL encloses the column definitions and the key and foreign key specifications all inside the same set of parentheses (contrast this with what Tutorial D does—again, see Chapter 2 and Chapter 5). Note too that by default SQL columns permit nulls; if we want to prohibit them, therefore (and I do), we have to specify an explicit constraint to that effect. There are various ways of defining such a constraint; specifying NOT NULL as part of the column definition is probably the easiest.

1.14 Tutorial D (I can’t show this in SQL, because SQL doesn’t support relational assignment):

SP := SP UNION RELATION { TUPLE { SNO 'S5' , PNO 'P6' , QTY 250 } } ;

The text between the keyword UNION and the closing semicolon is a relation selector invocation (see Chapter 3), and it denotes the relation that contains just the tuple to be inserted. See Chapter 5 for further discussion.

1.15 I’ll give an answer here for completeness (Tutorial D again), but I’ll defer detailed explanations to Chapter 7:

     WITH ( R1 := S WHERE CITY = 'Paris' ,
            R2 := EXTEND R1 : { STATUS := 25 } ) :
     S := ( S MINUS R1 ) UNION R2 ;

1.16 First consider the generic assignment:

     R := rx ;

Here R is a relvar name and rx is a relational expression, denoting the relation to be assigned to relvar R. An SQL analog might look like this—

     DELETE FROM T ;
     INSERT INTO T (...) tx ;

—where T is an SQL table corresponding to relvar R and tx is an SQL table expression corresponding to the relational expression rx. Note the need for the preliminary DELETE; note too that anything could happen, loosely speaking, between that DELETE and the subsequent INSERT, whereas there’s no notion in the relational case of there being anything “between the DELETE and the INSERT” (the assignment is a semantically atomic operation). In other words, the foregoing DELETE / INSERT combination, unlike the assignment it’s trying to mimic, is a sequence of two distinct statements. One implication of this fact is that a failure could occur between those two statements, something that couldn’t happen with the assignment as such.

As for the question “Can all relational assignments be expressed in terms of INSERT and/or DELETE and/or UPDATE?”, the answer is yes (though in fact we don’t need UPDATE as such at all). To be specific, the generic assignment—

     R := rx ;

—is logically equivalent to:

     R := ( R MINUS ( R MINUS rx ) ) UNION ( rx MINUS R ) ;

To elaborate slightly, let d and i be the relations denoted by the expressions R MINUS rx and rx MINUS R, respectively. Then the original assignment is logically equivalent to the following one:

     R := ( R MINUS d ) UNION i ;

This latter assignment is effectively equivalent to deleting d from R and then inserting i into R. Do note, however, that the DELETE and INSERT in question are both done as part of the same statement, not as two separate statements. See the discussion of multiple assignment in Chapter 8.

There’s another point I need to clear up here, too. In the body of the chapter, I said that SQL doesn’t support relational assignment directly, and that’s true. However, one reviewer of that chapter objected that, for example, the following SQL expression “could be thought of as relational assignment” (I’ve simplified the reviewer’s example somewhat):

     SELECT LS.*
     FROM ( SELECT SNO , SNAME , STATUS
            FROM   S
            WHERE  CITY = 'London' ) AS LS

In effect, the reviewer was suggesting that this expression is assigning some table value to a table variable called LS. But it isn’t. In particular, it isn’t possible to go on and do further queries or updates on LS; LS isn’t an independent table in its own right, it’s just a temporary table that’s conceptually materialized as part of the process of evaluating the overall SELECT expression. That expression is not a relational assignment. (In any case, assignment of any kind is a statement, not an expression. Statement vs. expression is another of those important logical differences. See Exercise 2.24 in Chapter 2.)

And one further point: The SQL standard supports a variant of CREATE TABLE, “CREATE TABLE AS,” that allows the base table being created to be initialized to the result of some query, thereby not only creating the table in question but also assigning an initial value to it. Once initialized, however, the table in question behaves just like any other base table; thus, CREATE TABLE AS doesn’t really constitute support for relational assignment, as such, either.

1.17 The discussions that follow are based on more extensive ones to be found in my book An Introduction to Database Systems (see Appendix G).

Duplicate tuples: Essentially, the concept makes no sense. Suppose for simplicity that the suppliers relvar had just two attributes, SNO and CITY, and suppose it contained a tuple showing that “it’s a true fact” that supplier S1 is located in London. Then if it also contained a duplicate of that tuple (if that were possible), it would simply be informing us of that same “true fact” a second time. But as Chapter 4 observes, if something is true, saying it twice doesn’t make it more true! For further discussion, see Chapter 4 or the paper “Double Trouble, Double Trouble” mentioned in Appendix G.

Tuple ordering: The lack of tuple ordering means there’s no such thing as “the first tuple” or “the fifth tuple” or “the 97th tuple” of a relation, and there’s no such thing as “the next tuple”; in other words, there’s no concept of positional addressing, and no concept of “nextness.” If we did have such concepts, we would need certain additional operators as well—for example, “retrieve the nth tuple,” “insert this tuple here,” “move this tuple from here to there,” and so on. As a matter of fact (to lift some text from Appendix A), it’s axiomatic that if we have n different ways to represent information, then we need n different sets of operators.[195] And if n > 1, then we have more operators to implement, document, teach, learn, remember, and use (and choose among). But those extra operators add complexity, not power! There’s nothing useful that can be done if n > 1 that can’t be done if n = 1.

By the way, another good argument against ordering (of any kind) is that positional addressing is fragile—the addresses change as insertions and deletions are performed.

Attribute ordering: The lack of attribute ordering means there’s no such thing as “the first attribute” or “the second attribute” (and so on), and there’s no “next attribute” (i.e., there’s no concept of “nextness”)—attributes are always referenced by name, never by position. As a result, the scope for errors and obscure programming is reduced. For example, there’s no way to subvert the system by somehow “flopping over” from one attribute to another. This situation contrasts with that found in certain programming systems, where it might be possible to exploit the physical adjacency of logically discrete items, deliberately or otherwise, in a variety of subversive ways. Note: Many other negative consequences of attribute ordering (or column ordering, rather, in SQL) are discussed in subsequent chapters. See also the paper “A Sweet Disorder,” mentioned in Appendix G.

In the interest of accuracy, I should add that for reasons that don’t concern us here, relations in mathematics, unlike their counterparts in the relational model, do have a left to right ordering to their attributes. A similar remark applies to tuples also. See Appendix A for further discussion.



[195] Note that tuple ordering does indeed constitute a way of representing information—namely, by position; that is, the fact that a given tuple appears here and not there certainly does represent information of some kind.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.77.76