A Xerces module allows you to validate more
than one XML instance at a time against an XML Schema. This hack
shows you how to use the Java class
xni.XMLGrammarBuilder
.
This book describes several online
and command-line validators that let you check whether a document
conforms to a W3C XML Schema definition. Some are faster than others,
and some are more suitable for a particular platform. The special
advantage of the Xerces Java xni.XMLGrammarBuilder
sample application (which, being a Java program, runs on any
platform) is its ability to validate multiple documents
simultaneously. This sample application is packaged in the
xercesSamples.jar file included with the Java
Xerces distribution (http://xml.apache.org/xerces2-j/), which is
part of the file archive that came with the book.
If you work with XML, you’ve probably received an
email at work that says “here’s the
data” and included a ZIP file full of XML
files—or worse, a bunch of files all attached individually to
the email. Before doing anything with those files, you probably want
to validate them to check whether the email’s sender
is passing along any problems to you. You could write a Perl script
to generate a batch file that calls your favorite parser for each
file, or you could enter the command to parse the first file, press
your cursor-up key to retrieve that command, modify it, run it again,
and repeat these steps multiple times. Or, you could use the
xni.XMLGrammarBuilder
utility and do it all in one
command. (Because the program does this by storing a compiled version
of the schema in memory and then reusing it for each document
instance, the integrity checks that it does while compiling make it a
useful schema development tool as well; see
[Hack #71]
.)
The following listings show you two short XML documents. I
won’t take up space showing you the
multidoc.xsd schema that they point to; take my
word for it that ZZ
is not one of the valid
zone
values and oomph
is not a
valid child of the para
element. Here is
multidoc1.xml:
<sample zone="ZZ" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="multidoc.xsd"> <title>Peyton Place</title> <para>Indian summer is like a woman.</para> </sample>
Here is multidoc2.xml:
<sample zone="Z1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="multidoc.xsd"> <title>Moby Dick</title> <para>Call me Ishmael.</para> <para>I <oomph>alone</oomph> survived to tell the tale.</para> </sample>
Before executing the command that follows, make sure that your
classpath includes both the xercesImpl.jar and
the xercesSamples.jar files (Version 2.6.2 or
later) that come with the Java Xerces distribution. You can download
the Xerces distribution from http://xml.apache.org/xerces2-j/download.cgi.
In the following command line, the -a
switch
identifies the XSD schema and -i
shows the list of
documents to validate:
java -cp xercesImpl.jar;xercesSamples.jar xni.XMLGrammarBuilder -a multidoc.xsd -i multidoc1.xml multidoc2.xml
Use a colon (:) between JAR filenames if you are
working in a Unix environment. The
xni.XMLGrammarBuilder
lists each
document’s problems:
[Error] multidoc1.xml:3:54: cvc-enumeration-valid: Value 'ZZ' is not facet-valid with respect to enumeration '[Z1, Z2, Z3, Z4, Z5, Z6]'. It must be a value from the enumeration. [Error] multidoc1.xml:3:54: cvc-attribute.3: The value 'ZZ' of attribute 'zone' on element 'sample' is not valid with respect to its type, 'zoneCodes'. [Error] multidoc2.xml:6:18: cvc-complex-type.2.4.a: Invalid content was found starting with element 'oomph'. One of '{"":emph}' is expected.
The error in multidoc1.xml generated two error messages, and the error in multidoc2.xml generated one, each with information about the location and nature of the error.
Entering the following line with no parameters gives you an overview
of xni.XMLGrammarBuilder
’s
command-line options.
java -cp xercesImpl.jar;xercesSamples.jar xni.XMLGrammarBuilder
These options are shown here:
usage: java xni.XMLGrammarBuilder [-p config_file] -d uri ... | [-f|-F] -a uri ... [-i uri ...] options: -p config_file: configuration to use for instance validation -d grammars to preparse are DTD external subsets -f | -F Turn on/off Schema full checking (default off) -a uri ... Provide a list of schema documents -i uri ... Provide a list of instance documents to validate NOTE: both -d and -a cannot be specified!
See the samples directory and documentation that
accompanies Xerces Java for more detailed documentation on
xni.XMLGrammarBuilder
.
—Bob DuCharme
18.119.133.96