Using DTDs

The DTD validation guarantees that the source document complies with the validity constraints defined in a separate file—the DTD. A DTD file uses a formal grammar to describe both the structure and the syntax of XML documents. XML authors use DTDs to narrow the set of tags and attributes allowed in their documents. Validating against a DTD ensures that processed documents conform to the specified structure. From a language perspective, a DTD defines a newer and stricter XML-based syntax and a new tagged language tailor-made for a related group of documents.

Historically speaking, the DTD was the first tool capable of defining the structure of a document. The DTD standard was developed a few decades ago to work side by side with SGML—a recognized ISO standard for defining markup languages. SGML is considered the ancestor of today’s XML, which actually sprang to life in the late 1990s as a way to simplify the too-rigid architecture of SGML.

DTDs use a proprietary syntax to define the syntax of markup constructs as well as additional definitions such as numeric and character entities. You can correctly think of DTDs as an early form of an XML schema. Although doomed to obsolescence, DTD is today supported by virtually all XML parsers.

An XML document is associated with a DTD file by using the DOCTYPE special tag. The validating parser (for example, the XmlValidatingReader class) recognizes this element and extracts from it the schema information. The DOCTYPE declaration can either point to an inline DTD or be a reference to an external DTD file.

Developing a DTD Grammar

Let’s look more closely at a DTD file. To build a DTD, you normally start writing the file according to its syntax. In this case, however, we’ll start from an XML file named data_dtd.xml that will actually be validated through the DTD, as shown here:

<?xml version="1.0" ?>
<!DOCTYPE class SYSTEM "class.dtd">

<!-- Sample XML document (data_dtd.xml) using a DTD -->

<class title="Applied XML Programming for .NET" 
   company="DinoEsposito’s Own Company" 
   author="Dino Esposito">
   <days total="5" expandable="true">
      <day id="1">XML Core Classes</day>
      <day id="2">Related Technologies</day>
      <day id="3">XML and ADO.NET</day>
      <day id="4" optional="true">XML and Applications</day>
      <day id="5" optional="true">XML Interoperability</day>
   </days>
</class>

As you can see, the file describes a class through its modules and topics covered. The general information about the class (title, author, training company) are written using attributes. Each module spans a full day, and its description is implemented using plain text.

Any XML document that must be validated against a given DTD file includes a DOCTYPE tag through which it simply links to the DTD of choice, as shown here:

<!DOCTYPE class SYSTEM "class.dtd">

The word following DOCTYPE identifies the metalanguage described by the DTD. This information is extremely important for the validation process. If that word—the document type name—does not match the root element of the DTD, a validation error is raised. The text following the SYSTEM attribute is the URL from which the DTD will actually be downloaded.

The following listing demonstrates a DTD that is tailor-made for the preceding XML document:

<!ELEMENT class (days)>
<!ATTLIST class title CDATA #REQUIRED
   author CDATA #IMPLIED
   company CDATA #IMPLIED>

<!ENTITY % Boolean "true | false">

<!ELEMENT days (day*)>
<!ATTLIST days total CDATA #REQUIRED
   expandable (%Boolean;) #REQUIRED>

<!ELEMENT day (#PCDATA)>
<!ATTLIST day id CDATA #REQUIRED
   optional (%Boolean;) #IMPLIED>

The ELEMENT tag identifies a node element, whereas ATTLIST is the tag that groups all attributes of a given node. Attributes are normally expressed through CDATA sections that contain unparsed data. In some cases, however, they can be allowed to take only the values defined by the specified entity. This is the case for the expandable attribute, whose only permitted values are true and false.

In the section “Further Reading,” on page 133, you’ll find references for learning more about the DTD syntax. What first catches the eye about DTDs is that they are written in a proprietary language that only mimics the typical markup of XML.

Validating Against a DTD

The following code snippet creates an XmlValidatingReader object that works on the sample XML file data_dtd.xml discussed in the section “Developing a DTD Grammar,” on page 97. The document is bound to a DTD file and is validated using the DTD validation type.

XmlTextReader _coreReader = new XmlTextReader("data_dtd.xml");
XmlValidatingReader reader = new XmlValidatingReader(_coreReader);
reader.ValidationType = ValidationType.DTD;
reader.ValidationEventHandler += new ValidationEventHandler(MyHandler); 
while(reader.Read());

Remember that when the validation type is set to Auto, the DTD option is the first to be considered.

When the validation mode is set to DTD, the validating parser returns a warning if the file has no link to any DTDs. Otherwise, if a DTD is correctly linked and accessible, the validation is performed, and in the process, entities are expanded. If the linked DTD file is not available, an exception is raised. What you’ll get is not a schema exception but a simpler FileNotFoundException exception.

If you mistakenly use a DTD to validate an XML file with schema information, a schema exception is thrown, but with a low severity level. In practice, you get a warning informing you that no DTD has been found in the XML file. Figure 3-4 shows how the sample application handles this situation.

Figure 3-4. When you try to use a DTD to validate an XML document with schema information, the validating parser returns a warning.


In general, if you decide that schema warnings are not serious enough to break the ongoing validation process, you can skip them with the following code:

private void MyHandler(object sender, ValidationEventArgs e)
{
   if (e.Severity == XmlSeverityType.Error) 
      {
         // Handle the schema exception
      }
}

Usage and Trade-Offs for DTDs

Unquestionably, the DTD validation format is an old one, albeit largely supported by virtually all available parsers. But if you are designing the validation layer for an XML-driven data exchange infrastructure today, there is no reason for you to discard XSDs. XSDs are more powerful than DTDs, and more important, they recently achieved W3C recommendation status, so they are a standard too.

So when should you use DTDs instead of XSDs, and under what circumstances will DTDs give you a better trade-off? Compatibility and legacy code are the only possible answers to these questions. Especially if your application handles complex DTDs, porting them to an XSD can be costly and is in no way an easy task. There is no official and totally reliable tool to automatically convert DTDs to schemas. On the W3C Web site (http://www.w3.org), you’ll find a conversion tool available for download, but I wouldn’t trust it to do the job unsupervised and then take the output as a trustworthy result.

Converting DTDs to schemas is no simple matter—in fact, it can be as complex as translating spoken languages. Translating from English to Italian, for example, requires a reengineering of the entire text, not just an adaptation of individual words and sentences. So design is deeply involved. When converting DTDs to schemas, you should also consider rearchitecting tags into types and perhaps rearchitecting the way you expose data in light of the new features.

Certainly XSDs provide you with more functions than DTDs can. For one thing, schemas are all written in XML and don’t require you to learn a new language. If you look at our basic DTD example in this context, you might not be scared by its unusual format. As you move from textbook examples and enter the tough real world, the complexity of an inflexible language like DTD becomes more apparent.

XSDs provide you with a finer level of control over the cardinality of the tags and the attribute types. In addition, XSDs can be used to set up a system of schema inheritance in which more complex types are built atop existing ones.

All in all, if you currently have a huge, complex DTD, probably the best thing you can do is continue working with it while you carefully plan a migration to XSDs. DTDs and XSDs are both renowned standards, but especially if you are exchanging data between heterogeneous platforms, you’re more likely to find a DTD-compliant parser than an XSD-compliant one. This situation will change over time, but not anytime soon. Check the supported functions for the XML parsers available on the target platform carefully before you drop DTDs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.9.223