Appendix A. The Relational Model

I believe quite strongly that if you think about the issue at the appropriate level of abstraction, you’re inexorably led to the position that databases must be relational. Let me immediately try to justify this very strong claim![173] My argument goes like this:

  • First of all, we saw in Chapter 5 that a database, despite the name, isn’t really just a collection of data; rather, it’s a collection of “true facts,” or (rather more respectably, since “facts” are supposed to be true by definition) true propositions—for example, the proposition “Joe’s salary is 50K.”

  • Propositions like “Joe’s salary is 50K” are easily encoded as ordered pairs—e.g., the ordered pair (Joe,50K), in the case at hand (where “Joe” is a value of type NAME, say, and “50K” is a value of type MONEY, say).

  • But we don’t want to record just any old propositions; rather, we want to record all propositions that happen to be true instantiations of certain predicates. In the case of “Joe’s salary is 50K,” for example, the pertinent predicate is “x’s salary is y,” where x is a value of type NAME and y is a value of type MONEY.

  • In other words, we want to record the extension of the predicate “x’s salary is y,” which we can do in the form of a set of ordered pairs.

  • But a set of ordered pairs is, precisely, a binary relation, in the mathematical sense of that term. Here’s the definition:

    Definition: A (mathematical) binary relation over two sets A and B is a subset of the cartesian product of A and B; in other words, it’s a set of ordered pairs (a,b), such that the first element a is a value from A and the second element b is a value from B.

  • A binary relation in the foregoing sense can be depicted as a table. Here’s an example:

    image with no caption

    (As an aside, I remark that this particular example is not just a relation but a function, because each person has just one salary. A function is a special case of a binary relation.) So we can regard this picture as depicting a subset of the cartesian product of the set of all names (“type NAME”) and the set of all money values (“type MONEY”), in that order.

Given the argument so far, then, we can see we’re talking about some fairly humble (but very solid) beginnings. However, in 1969-1970, Codd realized that:

  • We need to deal with n-adic, not just dyadic, predicates and propositions (e.g., “Joe has salary 50K, works in department D4, and was hired in 1993”). So we need to deal with n-ary relations, not just binary ones, and n-tuples (tuples for short), not just ordered pairs.

  • Left to right ordering might be acceptable for pairs but soon gets unwieldy for n > 2; so let’s replace that ordering concept by the concept of attributes (identified by name), and let’s redefine the relation concept accordingly. The example now looks like this:

    image with no caption

    From this point forward, then, you can take the term relation to mean a relation in this revised and extended sense, barring explicit statements to the contrary.

  • Data representation alone isn’t the end of the story—we need operators for deriving further relations from the given (“base”) ones, so that we can do queries and the like (e.g., “Get all persons with salary 60K”). But since a relation is both a logical construct (the extension of a predicate) and a mathematical one (a special kind of set), we can apply both logical and mathematical operators to it. Thus, Codd was able to define both a relational calculus (based on logic) and a relational algebra (based on set theory). And the relational model was born.



[173] One obvious objection is that there are clearly many nonrelational databases in existence already. True enough—but (unlike modern databases) those existing databases were never meant to be general purpose and application neutral; rather, they were typically built to serve some specific application. As a consequence, they don’t and can’t provide all of the functionality we’ve come to expect from a modern database (ad hoc query, view support, full data independence, flexible security and integrity controls, and so forth). In other words, I regard those older databases as nothing more than application specific data stores, and I would frankly prefer not to call them databases at all.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.25.4