Chapter 5. Validating XML

Your knowledge base and accompanying bag of XML tricks should be starting to feel a little more solid by now. You can create XML, use the Java SAX classes to parse through that XML, and now constrain that XML. This leads us to the next logical step: validating XML with Java. Without the ability to validate XML, business-to-business and inter-application communication becomes significantly more difficult; while constraints enable portability of our data, validity ensures its consistency. In other words, being able to constrain a document doesn’t help much if we can’t ensure that those constraints are enforced within our XML applications.

In this chapter, we will look at using additional SAX classes and interfaces to enforce validity constraints in our XML documents. We will examine how to set features and properties of a SAX-compliant parser, allowing easy configuration of validation, namespace handling, and other parser functionality. In addition, the errors and warnings that can occur with validating parsers will be detailed, filling in the blanks from earlier discussions on the SAX error handlers.

Configuring the Parser

With the wealth of XML-related specifications and technologies emerging from the World Wide Web Consortium (W3C), adding support for any new feature or property of an XML parser has become difficult. Many parser implementations have added proprietary extensions or methods at the cost of the portability of the code. While these software packages may implement the SAX XMLReader interface, the methods for setting document and schema validation, namespace support, and other core features are not standard across parser implementations. To address this, SAX 2.0 defines a standard mechanism for setting important properties and features of a parser that allows the addition of new properties and features as they are accepted by the W3C without the use of proprietary extensions or methods.

Setting Properties and Features

Lucky for us, SAX 2.0 includes the methods needed for setting properties and features in the XMLReader interface. This means we have to change little of our existing code to request validation, set the namespace separator, and handle other feature and property requests. The methods used for these purposes are outlined in Table 5.1.

Table 5-1. Property and Feature Methods

Method

Returns

Parameters

Syntax

                              setProperty(  )
void
String propertyID,
Object value
parser.setProperty(
    "[Property URI]", 
    "[Object parameter]");
                              setFeature(  )
void
String featureID, boolean state
parser.setFeature(
    "[Feature URI]", true);
                              getProperty(  )
Object
String propertyID
String separator = 
    (String)parser.getProperty(
        "[Property URI]");
                              getFeature(  )
boolean
String featureID
if (parser.getFeature(
    "[Feature URI]")) {
    doSomething(  );
}

For each of these, the ID of a specific property or feature is a URI. The core set of features and properties is listed in Appendix B. Additional documentation on features and properties supported by your vendor’s XML parser should also be available. Keep in mind, though, that these URIs are similar to namespace URIs; they are only used as associations for particular features. Good parsers ensure that you do not need network access to resolve these features; in this sense, you can think of them as simple constants that happen to be in URI form. These methods are simply invoked and the URI is de-referenced locally, often to a constant representing what action in the parser needs to be taken.

In the parser configuration context, a property requires some arbitrary object to be usable. For example, for lexical handling, a LexicalHandler implementation class might be supplied as the value for the property. In contrast, a feature is a flag used by the parser to indicate whether a certain type of processing should occur. Common features are validation, namespace support, and including external parameter entities.

The most convenient aspect of these methods is that they allow simple addition and modification of features. Although new or updated features will require a parser implementation to add supporting code, the method by which features and properties are accessed remains standard, as well as simple; only a new URI need be defined. Regardless of the complexity (or obscurity) of new XML-related ideas, this robust set of four methods should be sufficient to allow parsers to implement the new ideas.

Turning on Validation

So far, we have talked about how to set features and properties, but not about those functionalities themselves. In this chapter, we are most concerned with ensuring document validation during parsing. To illustrate the importance of these methods, a little history lesson is in order. In SAX 1.0, parser implementations had to provide their own (proprietary) solutions to handle parsing with validation and parsing without. Without the ability to turn validation on or off through a standard mechanism, it was easier to provide two independent parsing classes in order to remain standard in their use. For example, to use the early versions of Sun’s Project X parser without validation, the code fragment in Example 5.1 would be employed.

Example 5-1. Using a Non-Validating Parser with SAX 1.0

try {
    // Register a parser with SAX
    Parser parser = 
        ParserFactory.makeParser(
            "com.sun.xml.parser.Parser");
                    
    // Parse the document
    parser.parse(uri);  
  
} catch (Exception e) {
    e.printStackTrace(  );
}

Because no standard mechanism existed for requesting validation, a different class had to be loaded; this new class is an almost identical implementation of the SAX 1.0 Parser interface that performs validation. The code employed to use this parser is almost identical (see Example 5.2), with the exception of the class loaded for parsing.

Example 5-2. Using a Validating Parser with SAX 1.0

try {
    // Register a parser with SAX - use the validating parser
            Parser parser = 
                ParserFactory.makeParser(
                    "com.sun.xml.parser.ValidatingParser");
            
    // Parse the document
    parser.parse(uri);  
  
} catch (Exception e) {
    e.printStackTrace(  );
}

In addition to having to change and recompile source code when validation is turned on or off, this presents a little-realized problem in rolling out production-ready code that parses XML. A standard development environment will use code that validates all application-produced XML. This validation, although costly for performance, can ensure that the application is always producing correct XML documents, or that correct XML documents are always being received as input for the application’s components. Often, these validation constraints, once thoroughly tested, can be removed, resulting in a significant performance yield in production. It is possible in this situation to remove validation from the parser’s behavior because thorough testing has confirmed correct XML in development, but this change forces a source code modification and recompilation. Although this may sound fairly trivial, many companies do not allow code to go into production that has not run unchanged for a set length of time, often days if not weeks. This minor change to turn off validation can result in additional testing cycles, which are often redundant, and a lengthier time to market for applications.

A common argument here is that the name of the parser class to be used can be loaded from a properties file (we talked about this in Chapter 2, regarding XML application portability). However, consider the significance of changing a complete parser implementation class just before going into production. This is not a minor change, and should be tested thoroughly. When compared to changing the value of a feature set (supposing that the value to set the SAX validation feature is kept in a similar properties file), it is easy to determine which solution is preferred.

For all these reasons, SAX 2.0 added the methods we have been discussing to the XMLReader interface. With these methods, we can enable validation by using the URI specific to setting validation: http://xml.org/sax/features/validation. We could also request parsing of external entities and namespace processing, but for now we will simply add the validation feature to our parser shown in Example 5.3.

Example 5-3. Turning On Validation

// Get instances of our handlers
ContentHandler contentHandler = new MyContentHandler(  );
ErrorHandler errHandler = new ErrHandler(  );

try {
    // Instantiate a parser
    XMLReader parser = 
        XMLReaderFactory.createXMLReader(
            "org.apache.xerces.parsers.SAXParser");        

    // Register the content handler
    parser.setContentHandler(contentHandler);
    
    // Register the error handler
    parser.setErrorHandler(errHandler);

    parser.setFeature("http://xml.org/sax/features/validation", true);
        
    // Parse the document
    parser.parse(uri);

} catch (IOException e) {
    System.out.println("Error reading URI: " + e.getMessage(  ));
} catch (SAXException e) {
    System.out.println("Error in parsing: " + e.getMessage(  ));
}

With these straightforward changes, we are now ready to modify our sample XML file to again include the DTD reference and entity reference (which we commented out in an earlier chapter):

<?xml version="1.0"?>

<!-- We don't need these yet
  <?xml-stylesheet href="XSLJavaXML.html.xsl" type="text/xsl"?>
  <?xml-stylesheet href="XSLJavaXML.wml.xsl" type="text/xsl" 
                   media="wap"?>
  <?cocoon-process type="xslt"?>
-->

<!DOCTYPE JavaXML:Book SYSTEM "DTDJavaXML.dtd">

<!-- Java and XML -->
<JavaXML:Book xmlns:JavaXML="http://www.oreilly.com/catalog/javaxml/">
 <JavaXML:Title>Java and XML</JavaXML:Title>
 <JavaXML:Contents>
...
<!-- Uncomment the entity reference as well -->
<JavaXML:Copyright>&OReillyCopyright;</JavaXML:Copyright>

Make sure you have the DTD we created in the last chapter in the directory specified here. Before running the example, you need to make sure you are connected to the Internet; remember that in validation, any entity references you make are attempted to be resolved. In our example file, we have such an entity reference: OReillyCopyright. In our DTD, we referenced the URI http://www.oreilly.com/catalog/javaxml/docs/copyright.xml. When validation takes place, if this URI is not available, validation errors will occur. If you do not have Internet access, or do not want to use that access, you can replace the reference with a local file reference. For example, you may create a one-line text file like Example 5.4.

Example 5-4. Local Copyright File

This is a sample shared copyright file.

Save this file in a directory that is accessible by the parser program, and replace the DTD entity declaration with the path to this new file:

<!ENTITY OReillyCopyright SYSTEM 
         "entities/copyright.txt">

In this example, the text file should be saved as copyright.txt in a subdirectory named entities/. With this change, you are ready to run the sample program on the example XML file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.226.120