When to Use XML and When Not to Use It

In Chapter 4, you braved the waters of abstract data modeling, so now it's time to get concrete. Building your XML DTD is the next step in data-oriented application design.

The first question you must ask, while looking at your abstract data model diagram from Chapter 4 (see Figure 4-7) is “Where doesn't XML fit?” Identify the parts of your data model that make more sense as purely relational entities. Such parts usually are supporting data, such as key words and author names. For example, in CyberCinema there's no reason to list valid reviewers in an XML file. The reviewers and details about them (for example, names and e-mail addresses) can be stored in a relational table. The reviews themselves are best stored as XML.

Think Like an Archeologist

Put yourself in the mindset of an archeologist in the future, leafing through a collection of your XML instances. If you're thinking, “I am not writing this application for future archeologists,” you're missing the point. The Rosetta stone wasn't created for the benefit of future archeologists; it was created for the dissemination of information in its own time. By carefully building and documenting the DTD, you make your data more useful not only for today but also in the future.

Suppose you're an archeologist who came across a collection of XML instances. Consider what information must be in those instances in order for you to reconstruct the application that created them—to crack the code of this found data. Let's use as an example the simple e-mail application we built a data model for in Chapter 4. Your first step should be to “roughout” an XML vocabulary by creating some XML instances. This helps you understand the tagging structure before jumping headfirst into DTD design. Create a few of these dummy files. They're not intended to be used by your programs; they're just a tool to help you visualize what the XML files will look like and how they need to be structured. A first stab at an XML instance for this application might look like the following:

<E-MAIL>
<FROM>Dan Appelquist</FROM>
<TO>Bill Gates</TO>
<SUBJECT>I Like Windows</SUBJECT>
<BODY>Thanks for all the hard work.</BODY>
</E-MAIL>

This XML instance makes it easy for us to distinguish the From: field from the To: field and so on, but it doesn't help us construct an application in conjunction with a relational database.

We're designing this XML to work hand-in-hand with a relational database. In particular, we have to understand where numerical ID numbers come into play. Relational database schemas use numerical ID numbers in order to identify items uniquely and to preserve cross-references. Most of these ID numbers are generated as they are needed: When you insert a new row into a table, you first generate a new ID number. Depending on how your database works, ID number generation may be an automatic feature; it may involve setting up a database sequence, a feature of a database that generates consecutive unique numbers. Or it could be a number generated somewhere in the application and inserted into the database. In any case, it is a number that's unique at least to a particular set of items (for example, e-mail messages in our simple e-mail application).

For our e-mail application, if we assume that ID numbers are going to be assigned to each e-mail message, those numbers have to be reflected in the XML instance as well, so that we know which XML instances match up with which rows in our relational database. We do that by adding an ID attribute to the E-MAIL element we've created. Note that the name “ID” doesn't have any special meaning in XML instances; I simply decided to call it that. Because users are another item that will be stored in the relational database, we'll also include an ID field in our TO and FROM elements. Our message subject and body don't need unique IDs because they are unique to the message itself. Now have the following:

<E-MAIL ID="1">
<FROM ID="2">Dan Appelquist</FROM>
<TO ID="3">Bill Gates</TO>
<SUBJECT>I Like Windows</SUBJECT>
<BODY>Thanks for all the hard work.</BODY>
</E-MAIL>

Again, adopting the perspective of an archeologist in the future, suppose you came across thousands of XML instances in the preceding format. You could reconstruct not only a database of messages but also a database of all users who were part of this system. You could reconstruct entire conversations. If a user changed his or her name sometime during the operational period of the system, you wouldn't care because you'd be able to identify users by their unique IDs. In this manner, XML makes it possible for you to “future-proof” your documents.

Oh, by the way, if you happen not to be a future archeologist, but a system administrator or database administrator or DBA, frantically trying to reconstruct a database after a system crash, you may be thankful that the DTD was well designed. Future-proofing also means disaster-proofing your data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.238.20