Your knowledge base and accompanying bag of XML tricks should be starting to feel a little more solid by now. You can create XML, use the Java SAX classes to parse through that XML, and now constrain that XML. This leads us to the next logical step: validating XML with Java. Without the ability to validate XML, business-to-business and inter-application communication becomes significantly more difficult; while constraints enable portability of our data, validity ensures its consistency. In other words, being able to constrain a document doesn’t help much if we can’t ensure that those constraints are enforced within our XML applications.
In this chapter, we will look at using additional SAX classes and interfaces to enforce validity constraints in our XML documents. We will examine how to set features and properties of a SAX-compliant parser, allowing easy configuration of validation, namespace handling, and other parser functionality. In addition, the errors and warnings that can occur with validating parsers will be detailed, filling in the blanks from earlier discussions on the SAX error handlers.
With
the wealth of XML-related specifications and technologies emerging
from the World Wide Web Consortium (W3C), adding support for any new
feature or property of an XML parser has become difficult. Many
parser implementations have added proprietary extensions or methods
at the cost of the portability of the code. While these software
packages may implement the SAX
XMLReader
interface, the methods for setting
document and schema validation, namespace support, and other core
features are not standard across parser implementations. To address
this, SAX 2.0 defines a standard mechanism for setting important
properties and features of a parser that allows the addition of new
properties and features as they are accepted by the W3C without the
use of proprietary extensions or methods.
Lucky for us, SAX 2.0 includes the methods needed for setting
properties and features in the XMLReader
interface. This means we have to change little of our existing code
to request validation, set the namespace separator, and handle other
feature and property requests. The methods used for these purposes
are outlined in Table 5.1.
Table 5-1. Property and Feature Methods
For each of these, the ID of a specific property or feature is a URI. The core set of features and properties is listed in Appendix B. Additional documentation on features and properties supported by your vendor’s XML parser should also be available. Keep in mind, though, that these URIs are similar to namespace URIs; they are only used as associations for particular features. Good parsers ensure that you do not need network access to resolve these features; in this sense, you can think of them as simple constants that happen to be in URI form. These methods are simply invoked and the URI is de-referenced locally, often to a constant representing what action in the parser needs to be taken.
In the parser configuration context, a
property
requires some arbitrary
object to be usable. For example, for lexical handling, a
LexicalHandler
implementation class might be
supplied as the value for the property. In contrast, a
feature
is a flag used by the
parser to indicate whether a certain type of processing should occur.
Common features are validation, namespace support, and including
external parameter entities.
The most convenient aspect of these methods is that they allow simple addition and modification of features. Although new or updated features will require a parser implementation to add supporting code, the method by which features and properties are accessed remains standard, as well as simple; only a new URI need be defined. Regardless of the complexity (or obscurity) of new XML-related ideas, this robust set of four methods should be sufficient to allow parsers to implement the new ideas.
So far, we have talked about how to set features and properties, but not about those functionalities themselves. In this chapter, we are most concerned with ensuring document validation during parsing. To illustrate the importance of these methods, a little history lesson is in order. In SAX 1.0, parser implementations had to provide their own (proprietary) solutions to handle parsing with validation and parsing without. Without the ability to turn validation on or off through a standard mechanism, it was easier to provide two independent parsing classes in order to remain standard in their use. For example, to use the early versions of Sun’s Project X parser without validation, the code fragment in Example 5.1 would be employed.
Example 5-1. Using a Non-Validating Parser with SAX 1.0
try { // Register a parser with SAX Parser parser = ParserFactory.makeParser( "com.sun.xml.parser.Parser"); // Parse the document parser.parse(uri); } catch (Exception e) { e.printStackTrace( ); }
Because no standard mechanism existed for requesting validation, a
different class had to be loaded; this new class is an almost
identical implementation of the SAX 1.0 Parser
interface that performs validation. The code employed to use this
parser is almost identical (see Example 5.2), with
the exception of the class loaded for parsing.
Example 5-2. Using a Validating Parser with SAX 1.0
try { // Register a parser with SAX - use the validating parser Parser parser = ParserFactory.makeParser( "com.sun.xml.parser.ValidatingParser"); // Parse the document parser.parse(uri); } catch (Exception e) { e.printStackTrace( ); }
In addition to having to change and recompile source code when validation is turned on or off, this presents a little-realized problem in rolling out production-ready code that parses XML. A standard development environment will use code that validates all application-produced XML. This validation, although costly for performance, can ensure that the application is always producing correct XML documents, or that correct XML documents are always being received as input for the application’s components. Often, these validation constraints, once thoroughly tested, can be removed, resulting in a significant performance yield in production. It is possible in this situation to remove validation from the parser’s behavior because thorough testing has confirmed correct XML in development, but this change forces a source code modification and recompilation. Although this may sound fairly trivial, many companies do not allow code to go into production that has not run unchanged for a set length of time, often days if not weeks. This minor change to turn off validation can result in additional testing cycles, which are often redundant, and a lengthier time to market for applications.
A common argument here is that the name of the parser class to be used can be loaded from a properties file (we talked about this in Chapter 2, regarding XML application portability). However, consider the significance of changing a complete parser implementation class just before going into production. This is not a minor change, and should be tested thoroughly. When compared to changing the value of a feature set (supposing that the value to set the SAX validation feature is kept in a similar properties file), it is easy to determine which solution is preferred.
For all these reasons, SAX 2.0 added the methods we have been
discussing to the
XMLReader
interface. With these methods, we can
enable validation by using the URI specific to setting validation:
http://xml.org/sax/features/validation. We
could also request parsing of external entities and namespace
processing, but for now we will simply add the validation feature to
our parser shown in Example 5.3.
Example 5-3. Turning On Validation
// Get instances of our handlers
ContentHandler contentHandler = new MyContentHandler( );
ErrorHandler errHandler = new ErrHandler( );
try {
// Instantiate a parser
XMLReader parser =
XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser");
// Register the content handler
parser.setContentHandler(contentHandler);
// Register the error handler
parser.setErrorHandler(errHandler);
parser.setFeature("http://xml.org/sax/features/validation", true);
// Parse the document
parser.parse(uri);
} catch (IOException e) {
System.out.println("Error reading URI: " + e.getMessage( ));
} catch (SAXException e) {
System.out.println("Error in parsing: " + e.getMessage( ));
}
With these straightforward changes, we are now ready to modify our sample XML file to again include the DTD reference and entity reference (which we commented out in an earlier chapter):
<?xml version="1.0"?> <!-- We don't need these yet <?xml-stylesheet href="XSLJavaXML.html.xsl" type="text/xsl"?> <?xml-stylesheet href="XSLJavaXML.wml.xsl" type="text/xsl" media="wap"?> <?cocoon-process type="xslt"?> --> <!DOCTYPE JavaXML:Book SYSTEM "DTDJavaXML.dtd"> <!-- Java and XML --> <JavaXML:Book xmlns:JavaXML="http://www.oreilly.com/catalog/javaxml/"> <JavaXML:Title>Java and XML</JavaXML:Title> <JavaXML:Contents> ... <!-- Uncomment the entity reference as well --> <JavaXML:Copyright>&OReillyCopyright;</JavaXML:Copyright>
Make sure you have the DTD we created in the last chapter in the
directory specified here. Before running the example, you need to
make sure you are connected to the Internet; remember that in
validation, any entity references you make are attempted to be
resolved. In our example file, we have such an entity reference:
OReillyCopyright
. In our DTD, we referenced the
URI
http://www.oreilly.com/catalog/javaxml/docs/copyright.xml.
When validation takes place, if this URI is not available, validation
errors will occur. If you do not have Internet access, or do not want
to use that access, you can replace the reference with a local file
reference. For example, you may create a one-line text file like
Example 5.4.
Save this file in a directory that is accessible by the parser program, and replace the DTD entity declaration with the path to this new file:
<!ENTITY OReillyCopyright SYSTEM "entities/copyright.txt">
In this example, the text file should be saved as
copyright.txt
in a subdirectory named
entities/
. With this change, you are ready to
run the sample program on the example XML file.
3.15.226.120