CHAPTER 2

2.1 A type is a named, finite set of values—all possible values of some specific kind: for example, all possible integers, or all possible character strings, or all possible supplier numbers, or all possible XML documents, or all possible relations with a certain heading (etc., etc.). There’s no difference between a domain and a type. Note: SQL does draw a distinction between domains and types, however. In particular, it supports both a CREATE TYPE statement and a CREATE DOMAIN statement. To a first approximation, CREATE TYPE is SQL’s counterpart to the TYPE statement of Tutorial D, which I’ll be discussing in Chapter 8 (though there are many, many differences, not all of them trivial in nature, between the two). CREATE DOMAIN might be regarded, very charitably, as SQL’s attempt to provide a tiny part of the total functionality of CREATE TYPE (it was introduced in SQL:1992, while CREATE TYPE wasn’t introduced until SQL:1999); now that CREATE TYPE exists, there seems little reason to use, or even support, CREATE DOMAIN at all.

2.2 Every type has at least one associated selector; a selector is an operator that allows us to select, or specify, an arbitrary value of the type in question. Let T be a type and let S be a selector for T; then every value of type T must be returned by some successful invocation of S, and every successful invocation of S must return some value of type T. See Chapter 8 for further discussion. Note: Selectors are provided “automatically” in Tutorial D—since they’re required by the relational model, at least implicitly—but not, in general, in SQL. In fact, although the selector concept necessarily exists, SQL doesn’t really have a term for it; certainly selector as such isn’t an SQL term. Further details are beyond the scope of this book.

A literal is a “self-defining symbol”; it denotes a value that can be determined at compile time. More precisely, a literal is a symbol that denotes a value that’s fixed and determined by the symbol in question (and the type of that value is therefore also fixed and determined by the symbol in question). Here are some Tutorial D examples:

     4                              /* a literal of type INTEGER  */
     'XYZ'                          /* a literal of type CHAR     */
     FALSE                          /* a literal of type BOOLEAN  */
     5.0                            /* a literal of type RATIONAL */
     POINT ( 5.0 , 2.5 )            /* a literal of type POINT    */

(The last of these involves the user defined type POINT from the body of the chapter.)

Every value of every type, tuple and relation types included, must be denotable by means of some literal. A literal is a special case of a selector invocation; to be specific, it’s a selector invocation all of whose arguments are themselves specified as literals in turn (implying in particular that a selector invocation with no arguments at all, like the INTEGER selector invocation 4, is a literal by definition). Note finally that there’s a logical difference between a literal as such and a constant—a constant is a value, while a literal is a symbol that denotes such a value. (By the same token. there’s a logical difference between a literal and a value—as just stated, a value is a constant, such as the constant 3, while a literal is a symbol that denotes such a constant.)

2.3 A THE_ operator is an operator that provides access to some component of some “possible representation,” or possrep, of some specified value of some specified type. See Chapter 8 for further discussion. Note: THE_ operators are effectively provided “automatically” in both Tutorial D and SQL, to a first approximation. However, although the THE_ operator concept necessarily exists, SQL doesn’t exactly have a term for it; certainly THE_ operator as such isn’t an SQL term. Further details are beyond the scope of this book.

2.4 True in principle; might not be completely true in practice (but to the extent it isn’t, we’re talking about a confusion over model vs. implementation). Incidentally, the epigraph to the chapter is highly pertinent to the present exercise. Here it is again: “A major purpose of type systems is to avoid embarrassing questions about representations, and to forbid situations in which these questions might come up.” In other words, types are a good idea because they raise the level of abstraction (without a proper type system, everything would be nothing but tedious—and error prone—bit twiddling). And here’s another nice quote (this one’s from Andrew Wright: “On Sapphire and Type-Safe Languages,” CACM 46, No. 4, April 2003): “[Types make] program development and debugging easier by making program behavior more understandable.”

2.5 A parameter is a formal operand in terms of which some operator is defined. An argument is an actual operand that’s substituted for some parameter in some invocation of the operator in question. (People often use these terms as if they were interchangeable; much confusion is caused that way, and you need to be on the lookout for it.) Note: There’s also a logical difference between an argument as such and the expression that’s used to specify it. For example, consider the expression (2 + 3) - 1, which represents an invocation of the arithmetic operator “-”. The first argument to that invocation is the value five, but that argument is specified by the expression 2 + 3, which represents an invocation of the arithmetic operator “+”. (In fact, of course, every expression represents some operator invocation. Even a simple variable reference—V, say—can be regarded as representing an invocation of a certain operator: namely, the operator that returns the current value of the specified variable V.)

A database is a repository for data. (Note: In the relational world, we might say, a little more specifically, that a database is a container for relvars. But much more precise definitions are possible; one such can be found in Chapter 5 of this book. See also Appendix A.) A DBMS is a software system for managing databases; it provides data storage, recovery, concurrency, integrity, query/update, and other services.

A foreign key is a subset of the heading of some relvar, values of which must be equal to values of some “target” key in some other relvar (or possibly the same relvar). A pointer is a value (an address, essentially) for which certain special operators—notably referencing and dereferencing operators—can (and in fact must) be defined.[196] Note: Brief definitions of the referencing and dereferencing operators were given in a footnote in the body of the chapter.

A generated type is a type obtained by executing some type generator such as ARRAY, RELATION, or (in SQL) CHAR; specific array, relation, and (in SQL) character string types are thus generated types. A nongenerated type is a type that’s not a generated type.

A relation is a value; it has a type—a relation type, of course—but it isn’t itself a type. A type is a named, finite set of values: viz., all possible values of some particular kind.

Type is a model concept; types have semantics that must be understood by the user. Representation is an implementation concept; representations are supposed to be hidden from the user. In particular (and as noted in the body of the chapter), if X is a value or variable of type T, then the operators that apply to X are the operators defined for values and variables of type T, not the operators defined for the representation that applies to values and variables of type T. For example, just because the representation for type ENO (“employee numbers”) happens to be CHAR, say, it doesn’t follow that we can concatenate two employee numbers; we can do that only if concatenation is an operator that’s defined for values of type ENO. See the answer to Exercise 2.4 above for further discussion.

A system defined (or built in) type is a type that’s available for use as soon as the system is installed (it “comes in the same box the system comes in”). A user defined type is a type whose definition and implementation are provided by some suitably skilled user after the system is installed. (To the user of such a type, however—as opposed to the user who actually defines that type—that type should look and feel just like a system defined type.)

A system defined (or built in) operator is an operator that’s available for use as soon as the system is installed (it comes in the same box the system comes in). A user defined operator is an operator whose definition and implementation are provided by some suitably skilled user after the system is installed. (To the user of such an operator, however—as opposed to the user who designs and implements that operator—that operator should look and feel just like a system defined operator.) User defined operators can take arguments of either user or system defined types (or a mixture), but system defined operators can obviously take arguments of system defined types only.

2.6 A scalar type is a type that has no user visible components; a nonscalar type is a type that’s not a scalar type. Values, variables, operators, and so forth are scalar or nonscalar according as their type is scalar or nonscalar. Be aware, however, that these terms are neither very formal nor very precise, in the final analysis. In particular, we’ll meet a couple of important relations in Chapter 3 called TABLE_DUM and TABLE_DEE that are “scalar” by the foregoing definition!—or so it might be argued, at least.

2.7 Coercion is implicit type conversion. It’s deprecated because it’s error prone (but note that this is primarily a pragmatic issue; whether or not coercions are permitted has little or nothing to do with the relational model as such).

2.8 Because it muddles type and representation.

2.9 A type generator is an operator that returns a type instead of a value (and is invoked at compile time instead of run time). The relational model requires support for two such: namely, TUPLE and RELATION. Points arising:

  • Types generated by the TUPLE and RELATION type generators are nonscalar, but there’s no reason in principle why generated types have to be nonscalar. SQL in particular supports several scalar type generators (CHAR, NUMERIC, REF, and many others).

  • Type generators are known by many different names in the literature, including type constructors (the SQL term), parameterized types, polymorphic types, type templates, and generic types.

2.10 A relation is in first normal form (1NF) if and only if every tuple contains a single value, of the appropriate type, in every attribute position; in other words, every relation is in first normal form. Given this fact, you might be forgiven for wondering why we bother to talk about the concept at all (and in particular why it’s called “first”). The reason, as I’m sure you know (and as was in fact mentioned in Chapter 1), is that (a) we can extend it to apply to relvars as well as relations, and then (b) we can define a series of “higher” normal forms for relvars that turn out to be important in database design. In other words, 1NF is the base on which those higher normal forms build. But it really isn’t all that important as a notion in itself.

Note: I should add that 1NF is one of those concepts whose definition has evolved somewhat over the years. It used to be defined to mean that every tuple had to contain a single “atomic” value in every attribute position. As we’ve come to realize, however (and as I tried to show in the body of the chapter), the concept of data value atomicity actually has no objective meaning. An extensive discussion of such matters can be found in the paper “What First Normal Form Really Means” (see Appendix G).

2.11 The type of X is the type, T say, specified as the type of the result of the operator to be executed last—“the outermost operator”—when X is evaluated. That type is significant because it means X can be used in exactly (that is, in all and only) those positions where a literal of type T can appear.

2.12

      OPERATOR CUBE ( I INTEGER ) RETURNS INTEGER ;
        RETURN I * I * I ;
     END OPERATOR ;

2.13

  OPERATOR AREA_OF_R ( H LENGTH , W LENGTH ) RETURNS AREA ;
        RETURN H * W ;
     END OPERATOR ;

I’m assuming here, not unreasonably, that (a) it’s legal to multiply (“*”) a value of type LENGTH by another such value, and (b) the result of such a multiplication is a value of type AREA (another user defined type).

2.14 The following relation type is the type of the suppliers relvar S:

     RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

The suppliers relvar S itself is a variable of this type. And every legal value of that variable—for example, the value shown in Figure 1-3 in Chapter 1—is a value of this type.

2.15 SQL definitions are given in the answer to Exercise 1.13 earlier in this appendix. Tutorial D definitions:

     VAR P BASE RELATION
       { PNO CHAR , PNAME CHAR , COLOR CHAR , WEIGHT RATIONAL , CITY CHAR }
       KEY { PNO } ;

     VAR SP BASE RELATION
       { SNO CHAR , PNO CHAR , QTY INTEGER }
       KEY { SNO , PNO }
       FOREIGN KEY { SNO } REFERENCES S
       FOREIGN KEY { PNO } REFERENCES P ;

Some differences between the SQL and Tutorial D definitions:

  • As noted in the answer to Exercise 1.13 earlier in this appendix, SQL specifies keys and foreign keys, along with table columns (and certain other items too, beyond the scope of the present discussion) all inside the same set of parentheses—a fact that makes it hard to determine exactly what the pertinent type is. (As a matter of fact, SQL doesn’t really support the concept of a relation type—or table type—at all. See Chapter 3 for further discussion.)

  • The left to right order in which columns are listed matters in SQL. See Chapter 3 for further discussion.

  • SQL tables don’t have to have keys at all.

The significance of the fact that relvar P, for example, is of a certain relation type is as follows:

  • The only values that can ever be assigned to relvar P are relations of that type.

  • A reference to relvar P can appear wherever a literal of that type can appear (as in, for example, the expression P JOIN SP), in which case it denotes the relation that happens to be the current value of that relvar at the pertinent time. (In other words, a relvar reference is a valid relational expression in Tutorial D; note, however, that an analogous remark does not apply to SQL, at least not 100 percent.) See Chapter 6 and Chapter 12 for further discussion.

One further point: As you can see, I’ve defined attribute QTY to be of type INTEGER. However, my reason for doing so is partly historical—every DBMS I know supports type INTEGER, while few DBMSs if any support the type that would really be more appropriate in the case at hand: viz., NONNEGATIVE_INTEGER (with the obvious semantics). Of course, we could make NONNEGATIVE_INTEGER a user defined type, but I don’t want to get into too much detail regarding user defined types in this book.

2.16 a. Not valid; LOCATION = CITY(‘London’). b. Valid; BOOLEAN. c. Presumably valid; MONEY (I’m assuming that multiplying a money value by an integer returns another money value). d. Not valid; BUDGET + MONEY(50000). e. Not valid; ENO > ENO(‘E2’). f. Not valid; NAME(THE_C(ENAME) || THE_C(DNAME)) (I’m assuming that type NAME has a single “possrep component” called C, of type CHAR). g. Not valid; CITY(THE_C(LOCATION) || ‘burg’) (I’m assuming that type CITY has a single “possrep component” called C, of type CHAR). Note: I’m also assuming throughout these answers that a given type T always has a selector with the same name. See Chapter 8 for further discussion.

2.17 Such an operation logically means replacing one type by another, not “updating a type” (types aren’t variables and hence can’t be updated, by definition). Consider the following. First of all, the operation of defining a type doesn’t actually create the corresponding set of values; conceptually, those values already exist, and always will exist (think of type INTEGER, for example). All the “define type” operation (the TYPE statement in Tutorial D—see Chapter 8) really does is introduce a name by which that set of values can be referenced. Likewise, dropping a type doesn’t actually drop the corresponding values, it just drops the name that was introduced by the corresponding “define type” operation. It follows that “updating a type” really means dropping the type name and then reintroducing that very same name to refer to a different set of values. Of course, there’s nothing to preclude support for some kind of “alter type” shorthand to simplify matters—and SQL does support such an operator, in fact—but invoking such a shorthand shouldn’t be thought of as “updating the type.”

2.18 The empty type is certainly a valid type; however, it wouldn’t make much sense to define a variable to be of such a type, because no value could ever be assigned to such a variable! Despite this fact, the empty type turns out to be crucially important in connection with type inheritance—but that’s a topic that’s (sadly) beyond the scope of the present book. Refer to the book Databases, Types, and the Relational Model: The Third Manifesto, by Hugh Darwen and myself (see Appendix G), if you want to know more.

2.19 Let T be an SQL type for which “=” is not defined and let C be a column of type T. Then C can’t be part of a key or foreign key, nor can it be part of the argument to DISTINCT or GROUP BY or ORDER BY, nor can restrictions or joins or unions or intersections or differences be defined in terms of it. And what about implementation constructs such as indexes? There are probably other implications as well.

Second, let T be an SQL type for which the semantics of “=” are user defined (so T is necessarily user defined itself), and let C be a column of type T. Then the effects of making C part of a key or foreign key or applying DISTINCT or GROUP BY (etc., etc.) to it will be user defined as well, at best, and unpredictable at worst.

2.20 Here’s a trivial example of such violation. Let X be the character string ‘AB ’ (note the trailing space), let Y be the character string ‘AB’, and let PAD SPACE apply to the pertinent collation. Then the comparison X = Y gives TRUE, and yet the operator invocations CHAR_LENGTH(X) and CHAR_LENGTH(Y) give 3 and 2, respectively. (Note too that even though the comparison X = Y gives TRUE, the comparison X||X = Y||Y doesn’t!) I leave the detailed implications for you to think about, but it should be clear that problems are likely to surface in connection with DISTINCT, GROUP BY, and ORDER BY operations among others (as well as in connection with keys, foreign keys, and certain implementation constructs, such as indexes).

2.21 Because (a) they’re logically unnecessary, (b) they’re error prone, (c) end users can’t use them, (d) they’re clumsy—in particular, they have a direction to them, which other values don’t—and (e) they undermine type inheritance. (Details of this last point are beyond the scope of this book.) There are other reasons too. See the paper cited earlier (in a footnote to the answer to Exercise 2.5), “Inclusion Dependencies and Foreign Keys,” for further discussion.

2.22 One answer has to do with nulls; if we “set X to null” (which isn’t really assigning a value to X, because nulls aren’t values, but never mind), the comparison X = NULL certainly doesn’t give TRUE. There are many other examples too, not involving reliance on nulls. E.g., let X be a variable of type CHAR(3), let Y be the character string ‘AB’ (no trailing space), and let NO PAD apply to the pertinent collation. Then assigning Y to X will actually set X to the string ‘AB ’ (one trailing space), and after that assignment the comparison X = Y gives FALSE. Again I leave the implications for you to think about.

2.23 No! (Which database does type INTEGER belong to?) In an important sense, the whole subject of types and type management is orthogonal to the subject of databases and database management. We might even imagine the need for a “type administrator,” whose job it would be to look after types in a manner analogous to that in which the database administrator looks after databases.

2.24 An expression represents an operator invocation, and it denotes a value; it can be thought of as a rule for computing or determining the value in question. (Incidentally, the arguments to that operator invocation are themselves specified as expressions in turn—though the expressions in question might just be simple literals or simple variable references.) By contrast, a statement doesn’t denote a value; instead, it causes some action to occur, such as assigning a value to some variable or changing the flow of control. In SQL, for example,

     X + Y

is an expression, but

     SET Z = X + Y ;

is a statement.

2.25 An RVA is an attribute whose type is some relation type, and whose values are therefore relations of that type (see Chapter 7 for further discussion). A repeating group is an “attribute” of some type T whose values aren’t values of type T—note the contradiction in terms here!—but, rather, bags or sets or sequences (or ...) of values of type T. Note: Type T here is often a tuple type (or something approximating a tuple type). In a system that allows repeating groups, for example, a file might be such that each record consists of an ENO field (employee number), an ENAME field (employee name), and a repeating group JOBHIST, in which each entry consists of a JOB field (job title), a FROM field, and a TO field (where FROM and TO are dates).

2.26 “Subquery” is an SQL term meaning, loosely, a SELECT expression enclosed in parentheses. Later chapters will elaborate (especially Chapter 12).

2.27 Regarding SQL row and table types, see Chapter 3. As for type BOOLEAN, yes, “=” does apply; TRUE is equal to TRUE and FALSE is equal to FALSE. In SQL, what’s more, “<” applies as well!—FALSE is considered to be less than TRUE (i.e., the comparison “FALSE < TRUE” returns TRUE, in SQL).



[196] A much more extensive discussion of the logical difference between foreign keys and pointers can be found in the paper “Inclusion Dependencies and Foreign Keys” (see Appendix G).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.74.205