Chapter 8. Validating XML Documents

 

In the future, airplanes will be flown by a dog and a pilot. And the dog’s job will be to make sure that if the pilot tries to touch any of the buttons, the dog bites him.

 
 --Scott Adams

In the quote, the job of Scott Adams’s dog is to prevent human error in piloting an airplane. The idea is that computers will have things enough under control that human pilots will only get in the way. Although the complete removal of human pilots is not too probable in the near future, the prospect of computer-flown airplanes highlights the need for rock solid software systems that are 100% error free. Whether or not you look forward to flying on an airplane piloted entirely by computers (I, for one, do not!), as an XML developer you should make it a huge priority to develop XML documents that are 100% error free. Fortunately, schemas (DTDs and XSDs) make it possible to assess the validity of XML documents. This hour shows you how to use various tools to validate documents against a DTD or XSD.

In this hour, you’ll learn

  • The ins and outs of document validation

  • The basics of validation tools and how to use them

  • How to assess and repair invalid XML documents

Document Validation Revisited

As you know by now, the goal of most XML documents is to be valid. Document validity is extremely important because it guarantees that the data within a document conforms to a standard set of guidelines as laid out in a schema (DTD, XSD, or RELAX NG schema). Not all documents have to be valid, which is why I used the word “most” a moment ago. For example, many XML applications use XML to code small chunks of data that really don’t require the thorough validation options made possible by a schema. Even in this case, however, all XML documents must be well formed. A well-formed document, as you may recall, is a document that adheres to the fundamental structure of the XML language. Rules for well-formed documents include matching start tags with end tags and setting values for all attributes used, among others.

An XML application can certainly determine if a document is well formed without any other information, but it requires a schema in order to assess document validity. This schema typically comes in the form of a DTD (Document Type Definition) or XSD (XML Schema Definition), which you learned about in Hour 3, “Defining Data with DTD Schemas,” and Hour 7, “Using XML Schema.” To recap, schemas allow you to establish the following ground rules that XML documents must adhere to in order to be considered valid:

  • Establish the elements that can appear in an XML document, along with the attributes that can be used with each

  • Determine whether an element is empty or contains content (text and/or child elements)

  • Determine the number and sequence of child elements within an element

  • Set the default value for attributes

It’s probably safe to say that you have a good grasp on the usefulness of schemas, but you might be wondering about the details of how an XML document is actually validated with a schema. This task begins with the XML processor, which is typically a part of an XML application. The job of an XML processor is to process XML documents and somehow make the results available for further processing or display within an application. A modern web browser, such as Internet Explorer, Firefox, Safari, or Opera, includes an XML processor that is capable of processing an XML document and displaying it using a style sheet. The XML processor knows nothing about the style sheet—it just hands over the processed XML content for the browser to render.

The actual processing of an XML document is carried out by a special piece of software known as an XML parser. An XML parser is responsible for the nitty-gritty details of reading the characters in an XML document and resolving them into meaningful tags and relevant data. There are two types of parsers capable of being used during the processing of an XML document:

  • Standard (non-validating) parser

  • Validating parser

A standard XML parser, or non-valid parser, reads and analyzes a document to ensure that it is well formed. A standard parser checks to make sure that you’ve followed the basic language and syntax rules of XML. Standard XML parsers do not check to see if a document is valid—that’s the job of a validating parser. A validating parser picks up where a standard parser leaves off by comparing a document with its schema and making sure it adheres to the rules laid out in the schema. Because a document must be well-formed as part of being valid, a standard parser is still used when a document is being validated. In other words, a standard parser first checks to see if a document is well-formed, and then a validating parser checks to see if it is valid.

By the Way

In actuality, a validating parser includes a standard parser so that there is technically only one parser that can operate in two different modes.

When you begin looking for a means to validate your documents, make sure you find an XML application that includes a validating parser. Without a validating parser, there is no way to validate your documents. You can still see if they are well formed by using a standard parser only, which is certainly important, but it’s generally a good idea to go ahead and carry out a full validation.

Validation Tools

It’s good to develop an eye for clean XML code, which makes it possible to develop XML documents with a minimal amount of errors. However, it’s difficult for any human to perform such a technical task flawlessly, which is where XML validation tools come into play. XML validation tools are used to analyze the contents of XML documents to make sure they conform to a schema. There are two main types of validation tools available:

  • Web-based tools

  • Standalone tools

Web-based tools are web pages that allow you to enter the path (URI) of an XML document to have it validated. The upside to web-based tools is that they can be used without installing special software—just open the web page in a web browser and go for it! The downside to web-based validation tools is that they sometimes don’t work well when you aren’t dealing with files that are publicly available on the Internet. For example, if you’re working on an XML document on your local hard drive, it can be tough getting a web-based validation tool to work properly if the schema is also stored locally. Typically it’s a matter of getting the tool to recognize the schema; if the schema is located on the Internet there usually isn’t a problem, but if it’s located on your local hard drive, it can be tough getting things to work properly.

If you’re planning to do a lot of XML development work on your local hard drive, you might want to consider using a standalone validation tool. Standalone validation tools are tools that you must install on your computer in order to use. These kinds of tools range from full-blown XML editors such as XML Spy to command-line XML validators such as the W3C’s XSV validator. Standalone validation tools have the benefit of allowing you to validate local files with ease. The drawback to these tools is that some of them aren’t cheap, and they must be installed on your computer. However, if you don’t mind spending a little money, a standalone tool can come in extremely handy.

By the Way

For the record, not all standalone tools cost money. For example, the W3C’s XSV validator is available for free download at http://www.ltg.ed.ac.uk/~ht/xsv-status.html. XMLStarlet is another good option. It is an open source command-line XML tool that is freely available to download at http://xmlstar.sourceforge.net/.

Regardless of what type of validation tool you decide to use, there is a big distinction between validating documents against DTDs and validating them against XSDs. Although some tools support both types of schemas, many tools do not. You should therefore consider what type of schema you plan on using when assessing the different tools out there.

DTD Validation

DTDs have been around much longer than XSDs, so you’ll find that there are many more validation tools available for DTDs. One of the best tools I’ve found is made available by Brown University’s Scholarly Technology Group, or STG. STG’s tool comes in the form of a web page known as the XML Validation web page, which is accessible online at http://www.stg.brown.edu/service/xmlvalid/. Of course, this validation tool falls into the category of web-based tools. Figure 8.1 shows the STG XML Validation web page.

Brown University’s Scholarly Technology Group has an XML Validation web page that can be used to validate DTDs.

Figure 8.1. Brown University’s Scholarly Technology Group has an XML Validation web page that can be used to validate DTDs.

Similar to most web-based validation tools, there are two approaches available for validating documents with the XML Validation web page:

  • Access the document on the Internet

  • Access the document locally

Depending on your circumstances, the latter option is probably the simplest because you will likely be developing XML documents locally. As I mentioned earlier, sometimes validators have problems with local schemas, so the easier route in terms of having the validator run smoothly is to stick your document(s) and schema on a computer that is accessible on the Internet via a URI. That way you are guaranteeing that the validator can find the document and its schema, both of which are required for validation.

By the Way

If you’re able to post your schema to the Web so that it is available online, you can still use the XML Validation web page to validate local XML documents.

After specifying the document to the XML Validation web page and clicking the Validate button, any errors found during validation will be displayed in your web browser (see Figure 8.2).

The STG XML Validation web page reveals errors in an XML document during validation.

Figure 8.2. The STG XML Validation web page reveals errors in an XML document during validation.

The figure reveals errors that were introduced when I deliberately removed the final closing </comments> tag, which invalidates the document. Fortunately, the XML Validation web page caught the problem and alerted me. After repairing the problem and initiating the validation process again, everything turns out fine (see Figure 8.3).

The STG XML Validation web page reports that a document is indeed valid.

Figure 8.3. The STG XML Validation web page reports that a document is indeed valid.

By the Way

The online STG Validation web page only validates documents against DTDs, so you won’t be able to use it to validate against XSDs.

If you want a quick (and affordable) way to validate local documents against a DTD, then I highly recommend the <oXygen/> XML Editor by SyncRO Soft, which you first learned about in Hour 2, “Creating XML Documents.” You can download the <oXygen/> XML Editor for free at http://www.oxygenxml.com/. You’ll have to register the product in order to run it but registration is entirely free and you can choose not to receive email solicitations during registration. Figure 8.4 shows the <oXygen/> XML Editor reporting an error in a document during validation. The error in this case is the same missing closing </comment> tag error that was shown in Figure 8.2.

The <oXygen/> XML Editor is a good tool for performing DTD document validation on local XML documents.

Figure 8.4. The <oXygen/> XML Editor is a good tool for performing DTD document validation on local XML documents.

By the Way

XMLSpy Home Edition is another good free XML editor that supports the validation of XML documents using both DTDs and XSD schemas. To find out more, visit http://www.altova.com/support_freexmlspyhome.asp.

You can tell in the figure that <oXygen/> is extremely informative when it comes to detecting errors and alerting you to them. In fact, in this example the line of code containing the missing </comments> tag is highlighted to indicate where the problem lies. This kind of detailed error analysis is what makes tools such as <oXygen/> worth considering. The <oXygen/> XML Editor is also capable of validating documents against XSDs, which you learn how to do next.

XSD Validation

You’ll be glad to know that validating XML documents against an XSD is not much different than validating them against DTDs. It’s still important to have the XSD properly associated with the documents, as you learned how to do in the previous lesson. Once that’s done, it’s basically a matter of feeding the document to a validation tool that is capable of handling XSD schemas. The W3C offers an online XSD validation tool called the W3C Validator for XML Schema, which is located at http://www.w3.org/2001/03/webdata/xsv/. This web-based validator works similarly to the web-based DTD validator you learned about in the previous section. You specify the URI of a document and the validator does all the work. Although the W3C Validator for XML Schema can certainly get the job done, it has a limitation in that you must have the schema file hosted online in order to validate documents.

By the Way

The underlying XML processor used in the W3C Validator for XML Schema web page is called XSV and is also available from the W3C as a standalone validator. You can download this standalone command-line validator for free from the W3C at http://www.ltg.ed.ac.uk/~ht/xsv-status.html.

An even better online validation tool for use with local files is the online XML Schema Validator, which is very easy to use. This validator is accessible online at http://apps.gotdotnet.com/xmltools/xsdvalidator/. What makes this online validator particularly useful is that it allows you to specify both the XML document and its XSD schema as local files, which means the schema file doesn’t have to be publicly accessible online. Figure 8.5 shows the XML Schema Validator successfully validating the training log example document against the etml.xsd schema.

The online XML Schema Validator serves as an excellent tool for validating an XML document against an XSD schema.

Figure 8.5. The online XML Schema Validator serves as an excellent tool for validating an XML document against an XSD schema.

If you are planning on working with a lot of documents locally or maybe are looking for additional features in a validation tool, I’d return to the familiar <oXygen/> XML Editor that I mentioned in the previous section. In addition to DTD validation, <oXygen/> also supports XSD validation. Figure 8.6 shows the same training log document being successfully validated using the <oXygen/> XML Editor.

The <oXygen/> XML Editor is also useful for validating XML documents against XSDs.

Figure 8.6. The <oXygen/> XML Editor is also useful for validating XML documents against XSDs.

Validate Document

The <oXygen/> XML Editor is an excellent and freely available local tool for use in validating XML documents with either DTD or XSD schemas.

Repairing Invalid Documents

If you have any programming experience, the term “debugging” is no doubt familiar to you. If not, get ready because debugging is often the most difficult part of any software development project. Debugging refers to the process of finding and fixing errors in a software application. Depending on the complexity of the code in an application, debugging can get quite messy. The process of repairing invalid XML documents is in many ways similar to debugging software. However, XML isn’t a programming language and XML documents aren’t programs, which makes things considerably easier for XML developers. That’s the good news.

The other good news is that validation tools give you a huge boost when it comes to making your XML documents free of errors. Not only do most validation tools alert you to the existence of errors in a document, but also most of them will give you a pretty good idea about where the errors are in the document. This is no small benefit. Even an experienced XML developer can overlook the most obvious errors after staring at code for long periods of time. Not only that, but XML is an extremely picky language, which leaves the door wide open for you to make mistakes. Errors are, unfortunately, a natural part of the development process, be it software, XML documents, or typing skills that you are developing.

So, knowing that your XML documents are bound to have a few mistakes, how do you go about finding and eliminating the errors? The first step is to run the document through a standard XML parser to check that the document is well formed. Remember that any validation tool will check if a document is well formed if you don’t associate the document with a schema. As an example, the <oXygen/> XML Editor includes a toolbar button for simply checking that a document is well formed, as opposed to carrying out a full document validation (see Figure 8.7).

It is often helpful to test an XML document for well formedness before taking things to the next level and performing a full validation against a schema.

Figure 8.7. It is often helpful to test an XML document for well formedness before taking things to the next level and performing a full validation against a schema.

Check That Document is Well Formed

The first time you create a document, consider taking it for a spin through a validation tool without associating it with a schema. At this stage the tool will report only errors in the document that have to do with it being well formed. In other words, no validity checks will be made, which is fine for now.

Errors occurring during the well-formed check include typos in element and attribute names, unmatched tag pairs, and unquoted attribute values, to name a few. These errors should be relatively easy to find, and at some point you should get pretty good at creating documents that are close to being well formed on the first try. In other words, it isn’t too terribly difficult to avoid the errors that keep a document from being well formed.

After you’ve determined that your document is well formed, you can wire it back to a schema and take a shot at checking it for validity. Don’t be too disappointed if several errors are reported the first time around. Keep in mind that you are working with a very demanding technology in XML that insists on things being absolutely 100% accurate. You must use elements and attributes in the exact manner as they are laid out in a schema; anything else will lead to validity errors.

Perhaps the trickiest validity error is that of invalid nesting. If you accidentally close an element in the wrong place with a misplaced end tag, it can really confuse a validation tool and give you some strange results. Following is a simple example of what I’m talking about:

<session date="2001-11-19" type="running" heartrate="158">
  <duration>PT45M</duration>
  <distance units="miles">5.5</distance>
  <location>Warner Park</location>
  <comments>Mid-morning run, a little winded throughout.
</session>
</comments>

In this code the closing </comments> tag appears after the closing </session> tag, which is an overlap error because the entire comments element should be inside of the session element. The problem with this kind of error is that it often confuses the validation tool. There is no doubting that you’ll get an error report, but it may not isolate the error as accurately as you had hoped. It’s even possible for the validation tool to get confused to the extent that a domino effect results, where the single misplaced tag causes many other errors. So, if you get a slew of errors that don’t seem to make much sense, study your document carefully and make sure all of your start and end tags match up properly.

Beyond the misplaced end tag problem, most validity errors are relatively easy to track down with the help of a good validation tool. Just pay close attention to the output of the tool, and tackle each error one at a time. With a little diligence, you can have valid documents without much work.

Summary

In previous hours you learned how to create XML documents and schemas that could be used to validate those documents for technical accuracy. What you didn’t learn how to do, however, was actually carry out the validation of XML documents. Document validation isn’t something you can carry out yourself with a pencil and piece of paper, or even a calculator—validation is an automated process carried out by a special software application known as a validation tool, or validator. Validation tools are a critical part of the XML development process because they allow you to determine the correctness of your documents.

In this hour you learned how document validation is carried out by XML applications. More specifically, you learned about standard and validating parsers and how they fit into the validation equation. You then learned how to use some of the different validation tools available for use in validating XML documents, including both online and local tools. And finally, you found out how to track down and repair errors in XML documents.

Q&A

Q.

Is it possible to validate HTML web pages?

A.

Strictly speaking, it isn’t possible to validate HTML-based documents because HTML isn’t actually an XML language and therefore doesn’t conform to the rules of XML. However, you can code web pages in XHTML, which is a markup language that can be validated. XHTML is an XML-based language that is very similar to HTML but with the improved structure of XML. You learn all about XHTML in Hour 21, “Adding Structure to the Web with XHTML,” including how to validate XHTML web pages.

Q.

Do I have to check a document for well formedness before moving on to checking it for validity?

A.

No. I recommend the two-stage check on a document only because it helps to clarify the different types of errors commonly found in XML documents. Your goal should be to develop enough XML coding skills to avoid most well-formedness errors, which frees you up to spend most of your time tackling the peskier validity errors. Knowing this, at some point you may decide to jump straight into the validity check for documents as you gain experience.

Workshop

The Workshop is designed to help you anticipate possible questions, review what you’ve learned, and begin learning how to put your knowledge into practice.

Quiz

1.

What is an XML processor?

2.

What’s the difference between a standard parser and a validating parser?

3.

What is the main limitation of some web-based XML validation tools?

Quiz Answers

1.

An XML processor is usually part of a larger XML application, and its job is to process XML documents and somehow make the results available for further processing or display within the application.

2.

A standard parser first checks to see if a document is well formed, whereas a validating parser further checks to see if it is valid.

3.

Some web-based XML validation tools can be difficult to use when the documents to be validated (and their associated schemas) are stored on a local hard drive. The XML Schema Validator mentioned in this lesson is one notable online tool that doesn’t suffer from this problem.

Exercises

1.

Modify one of the training log example documents so that the code intentionally violates the ETML schema. Run the file through a validation tool and take note of the error(s) reported.

2.

Repair the error code in the training log document and make sure that it validates properly in a validation tool.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.9.124