DATA VALUE ATOMICITY

I hope the previous section succeeded in convincing you that domains really are types, no more and no less. Now I want to turn to the issue of data value atomicity and the related notion of first normal form (1NF for short). In Chapter 1, I said that 1NF meant that every tuple in every relation contains just a single value (of the appropriate type) in every attribute position—and it’s usual to add that those “single values” are supposed to be atomic. But this latter requirement raises the obvious question: What does it mean for data to be atomic?

Well, in A REVIEW OF THE ORIGINAL MODEL of the book mentioned earlier (The Relational Model for Database Management Version 2), Codd defines atomic data as data that “cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions).” Even if we ignore that parenthetical exclusion, however, this definition is a trifle puzzling; at best, it’s certainly not very precise. For example, what about character strings? Are character strings atomic? Well, every database product I know provides a variety of operators—LIKE, SUBSTR (substring), “||” (concatenate), and so on—that rely by definition on the fact that character strings in general can be “decomposed into smaller pieces by the DBMS.” So are such strings atomic? What do you think?

Here are some other examples of values whose atomicity is at least open to question and yet we would certainly want to allow as attribute values in tuples in relations:

  • Bit strings

  • Rational numbers (which might be regarded as being decomposable into integer and fractional parts)

  • Dates and times (which might be regarded as being decomposable into year / month / day and hour / minute / second components, respectively)

And so on.

Now I’d like to move on to what might be considered a more startling example. Refer to Figure 2-1 below. Relation R1 in that figure is a reduced version of the shipments relation from our running example; it shows that certain suppliers supply certain parts, and it contains one tuple for each legitimate (SNO,PNO) combination. For the sake of the example, let’s agree that supplier numbers and part numbers are indeed “atomic”; then we can presumably agree that R1, at least, is in 1NF.

Relations R1, R2, and R3

Figure 2-1. Relations R1, R2, and R3

Now suppose we replace R1 by R2, which shows that certain suppliers supply certain groups of parts (attribute PNO in R2 is what some writers would call multivalued, and values of that attribute are groups of part numbers). Then most people would surely say that R2 is not in 1NF; in fact, it looks like a case of “repeating groups,” and repeating groups are the one thing that just about everybody agrees 1NF is supposed to prohibit (because such groups are obviously not atomic—right?).

Well, let’s agree for the sake of the argument that R2 isn’t in 1NF. But suppose we now replace R2 by R3. Then I claim that R3 is in 1NF![22] For consider:

  • First, note that I’ve renamed the attribute PNO_SET, and I’ve shown the groups of part numbers that are PNO_SET values enclosed in braces, to emphasize the fact that each such group is indeed a single value: a set value, to be sure, but a set is still, at a certain level of abstraction, a single value.

  • Second (and regardless of what you might think of my first argument), the fact is that a set like {P2,P4,P5} is no more and no less decomposable by the DBMS than a character string is. Like character strings, sets do have some inner structure; as with character strings, however, it’s convenient to ignore that structure for certain purposes. In other words, if character strings are compatible with the requirements of 1NF—that is, if character strings are atomic—then sets must be, too.

The real point I’m getting at here is that the notion of atomicity has no absolute meaning; it just depends on what we want to do with the data. Sometimes we want to deal with an entire set of part numbers as a single thing; sometimes we want to deal with individual part numbers within that set—but then we’re descending to a lower level of detail, or lower level of abstraction. The following analogy might help. In physics (which after all is where the terminology of atomicity comes from) the situation is exactly parallel: Sometimes we want to think about individual atoms as indivisible things, sometimes we want to think about the subatomic particles (i.e., the protons, neutrons, and electrons) that make up those atoms. What’s more, protons and neutrons, at least, aren’t really indivisible, either—they contain a variety of “subsubatomic” particles called quarks. And so on, possibly (?).

Let’s return for a moment to relation R3. In Figure 2-1, I showed PNO_SET values as general sets. But it would be more useful in practice if they were, more specifically, relations (see Figure 2-2, where I’ve changed the attribute name to PNO_REL). Why would it be more useful? Because relations, not general sets, are what the relational model is all about.[23] As a consequence, the full power of the relational algebra immediately becomes available for the relations in question—they can be restricted, projected, joined, and so on. By contrast, if we were to use general sets instead of relations, then we would need to introduce new operators (set union, set intersection, and so on) for dealing with those sets ... Much better to get as much mileage as we can out of the operators we already have!

Relation R4 (a revised version of R3)

Figure 2-2. Relation R4 (a revised version of R3)

Terminology: Attribute PNO_REL in Figure 2-2 is a relation valued attribute (RVA). Of course, the underlying domain is relation valued too (that is, the values it’s made up of are relations). I’ll have more to say about RVAs in Chapter 7; here let me just note that SQL doesn’t support them. (More precisely, it doesn’t support what would be its analog of RVAs, table valued columns. Oddly enough, however, it does support columns whose values are arrays, and columns whose values are rows, and even columns whose values are “multisets of rows”—where a multiset, also known as a bag, is like a set except that it permits duplicates.[24] Columns whose values are multisets of rows thus do look a little bit like “table valued columns”; however, they aren’t table valued columns, because the values they contain can’t be operated upon by means of SQL’s regular table operators and thus aren’t regular SQL table values, by definition.)

Now, I chose the foregoing example deliberately, for its shock value. After all, relations with RVAs do look rather like “relations” with repeating groups, and you’ve probably always heard that repeating groups are a “no no” in the relational world. But I could have used any number of different examples to make my point; I could have shown attributes (and therefore domains) that contained arrays; or bags (multisets); or lists; or photographs; or audio or video recordings; or X rays; or fingerprints; or XML documents; or any other kind of value, “atomic” or “nonatomic,” you might care to think of. Attributes, and therefore domains, can contain anything (any values, that is).

Incidentally, you might recall that a few years ago we were hearing a great deal about so called “object/relational” systems. Well, the foregoing paragraph goes a long way toward explaining why a true object/relational system would in fact be nothing more nor less than a true relational system—which is to say, a system that supports the relational model, with all that such support entails (after all, the whole point about an object/relational system from the user’s point of view is precisely that we can have attribute values in relations that are of arbitrary complexity). Perhaps a better way to say it is: A proper object/relational system is just a relational system with proper type support (including proper user defined type support in particular)—which just means it’s a proper relational system, no more and no less. And what some are pleased to call “the object/relational model” is, likewise, just the relational model, no more and no less.



[22] Observe that I don’t claim it’s well designed—indeed, it probably isn’t—but that’s not the point. I’m concerned here with what’s legal, not with questions of good design. The design of R3 is legal.

[23] In case you’re wondering, the difference is that sets in general can contain anything, but relations contain tuples. Note, however, that a relation certainly resembles a general set inasmuch as it too can be regarded as a single value.

[24] The individual elements in an SQL multiset don’t have to be rows but can be values of any available SQL type—for example, integers. The same goes for arrays as well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.197.95