MORE ON CANDIDATE KEYS

I explained the basic idea of candidate keys in Chapter 1, but now I want to make the concept more precise. Here first is a definition:

Definition: Let K be a subset of the heading of relvar R. Then K is a candidate key (or just key for short) for R if and only if it possesses both of the following properties:

  1. Uniqueness: No valid value for R contains two distinct tuples with the same value for K.

  2. Irreducibility: No proper subset of K has the uniqueness property.

If K consists of n attributes, then n is the degree of K.

Now, the uniqueness property is self-explanatory, but I need to say a little more about the irreducibility property. Consider relvar S and the set of attributes—let’s call it SC—{SNO,CITY}, which is certainly a subset of the heading of S that has the uniqueness property (no relation that’s a valid value for relvar S ever has two distinct tuples with the same SC value). But it doesn’t have the irreducibility property, because we could discard the CITY attribute and what’s left, the singleton set {SNO}, would still have the uniqueness property. So we don’t regard SC as a key, because it’s “too big.” By contrast, {SNO} is irreducible, and it’s a key.

Why do we want keys to be irreducible? One important reason is that if we were to specify a “key” that wasn’t irreducible, the DBMS wouldn’t be able to enforce the proper uniqueness constraint. For example, suppose we told the DBMS (lying!) that SC was a key for relvar S. Then the DBMS couldn’t enforce the constraint that supplier numbers are “globally” unique; instead, it could enforce only the weaker constraint that supplier numbers are “locally” unique, in the sense that they’re unique within the pertinent city. So this is one reason—not the only one—why we require keys not to contain any attributes that aren’t needed for unique identification purposes. Recommendation: In SQL, never lie to the system by defining as a key some column combination that you know isn’t irreducible. (By the way, you might think this recommendation rather obvious, but I’ve certainly seen it violated in practice; in fact, I’ve even seen such a violation explicitly recommended, by writers who really ought to know better.)

Now, all of the relvars we’ve seen so far have had just one key. Here by contrast are several self-explanatory examples (in Tutorial D only, for simplicity) of relvars with two or more. Note the overlapping nature of the keys in the second and third examples. Note: I assume the availability of certain user defined types in these definitions.

     VAR TAX_BRACKET BASE RELATION
       { LOW MONEY , HIGH MONEY , PERCENTAGE INTEGER }
       KEY { LOW }
       KEY { HIGH }
       KEY { PERCENTAGE } ;

     VAR ROSTER BASE RELATION
       { DAY DAY_OF_WEEK , TIME TIME_OF_DAY , GATE GATE , PILOT NAME }
       KEY { DAY , TIME , GATE }
       KEY { DAY , TIME , PILOT } ;

     VAR MARRIAGE BASE RELATION
       { SPOUSE_A NAME , SPOUSE_B NAME , DATE_OF_MARRIAGE DATE }
         /* assume no polygamy and no persons marrying */
         /* each other more than once ...              */
       KEY { SPOUSE_A , DATE_OF_MARRIAGE }
       KEY { DATE_OF_MARRIAGE , SPOUSE_B }
       KEY { SPOUSE_B , SPOUSE_A } ;

By the way, you might have noticed a tiny sleight of hand here. A key is supposed to be a set of attributes, and an attribute is supposed to be an attribute-name/type-name pair; yet the Tutorial D KEY syntax specifies just attribute names, not attribute-name/type-name pairs. The syntax works, however, because attribute names are unique within the pertinent heading, and the corresponding type names are thus specified implicitly. In fact, analogous remarks apply at various points in the Tutorial D language, and I won’t bother to repeat them every time, letting this one paragraph do duty for all.

I’ll close this section with a few miscellaneous points. First, note that the key concept applies to relvars, not relations. Why? Because to say something is a key is to say a certain integrity constraint is in effect—a certain uniqueness constraint, to be specific—and integrity constraints apply to variables, not values.[64] (By definition, integrity constraints constrain updates, and updates apply to variables, not values. See Chapter 8 for further discussion.)

Second, in the case of base relvars in particular, it’s usual, as noted in Chapter 1, to single out one key as the primary key (and any other keys for the relvar in question are then sometimes said to be alternate keys). But whether some key is chosen as primary, and if so which one, are essentially psychological issues, beyond the purview of the relational model as such. As a matter of good practice, most base relvars probably should have a primary key—but, to repeat, this rule, if it is a rule, really isn’t a relational issue as such. Certainly it isn’t inviolable.

Third, if R is a relvar, then R certainly does have, and in fact must have, at least one key. The reason is that every possible value of R is a relation and therefore contains no duplicate tuples, by definition; at the very least, therefore, the combination of all of the attributes of R—i.e., the entire heading—certainly has the uniqueness property. Thus, either that combination also has the irreducibility property, or there’s some proper subset of that combination that does. Either way, there’s certainly something that’s both unique and irreducible. Note: These remarks don’t necessarily apply to SQL tables—SQL tables allow duplicate rows and so might have no key at all. Strong recommendation: In SQL, for base tables at any rate, use UNIQUE and/or PRIMARY KEY specifications to ensure that every such table does have at least one key.

Fourth, note that key values are tuples (rows, in SQL), not scalars. In the case of relvar S, for example, with its sole key {SNO}, the value of that key for some specific tuple—say that for supplier S1—is:

     TUPLE { SNO 'S1' }

(a subtuple of the pertinent tuple—recall that every subset of a tuple is a tuple in turn). Of course, in practice we would usually say, informally, that the key value in this example is just S1—or ‘S1’, rather—but it really isn’t. And so now it should be clear just how keys, like so much else in the relational model, rely crucially on the concept of tuple equality. To spell the point out: In order to enforce some key uniqueness constraint, we need to be able to tell whether two key values are equal, and that’s precisely a matter of testing two tuples for equality—even when, as in the case of relvar S, the tuples in question are of degree one and “look like” simple scalar values.

Fifth, let SK be a subset of the heading of relvar R that possesses the uniqueness property but not necessarily the irreducibility property. Then SK is a superkey for R (and a superkey that isn’t a key is called a proper superkey). For example, {SNO} and {SNO,CITY} are both superkeys—and the latter is a proper superkey—for relvar S. Note that the heading of any relvar R is always a superkey for R, by definition.

My final point has to do with the notion of functional dependency.[65] I don’t want to get into a lot of detail regarding that concept here—I’ll come back to it in Chapter 8—but you’re probably familiar with it anyway. All I want to do here is call your attention to the following. Let SK be a superkey (possibly a key) for relvar R, and let X be any subset of the heading of R. Then the functional dependency (FD)

     SKX

holds in R, necessarily. To elaborate briefly: In general, the functional dependency SKX means that whenever two tuples of R have the same value for SK, they also have the same value for X. But if two tuples have the same value for SK, where SK is a superkey, then by definition they must be the very same tuple!—and so they must have the same value for X. In other words, loosely: We always have functional dependency arrows “out of superkeys” (and therefore out of keys in particular) to everything else in the relvar.



[64] On the other hand, it does make sense to say of some relation that it either does or does not satisfy some key constraint. We might even go further and say, a trifle sloppily, that a relation that satisfies a given key constraint actually “has” the key in question—though such a manner of speaking is likely to cause confusion, and I wouldn’t recommend it.

[65] The terms dependence and dependency are used interchangeably in the literature (and in this book), in contexts such as the one under discussion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.154.16