Compared to the alternatives, RELAX NG schemas are easy to use and learn, and the more you use them the more you become convinced.
RELAX NG (http://www.relaxng.org) is a powerful schema language with a simple syntax. Originally, RELAX NG was developed in a small OASIS technical committee led by James Clark. It is based on ideas from Clark’s TREX (http://www.thaiopensource.com/trex/) and Murata Makoto’s Relax (http://www.xml.gr.jp/relax/), and its first committee spec was published on December 3, 2001 (http://www.oasis-open.org/committees/relax-ng/spec.html). A tutorial is also available (http://www.oasis-open.org/committees/relax-ng/tutorial.html). Recently, RELAX NG became an international standard under ISO as ISO/IEC 19757-2:2004, Information technology— Document Schema Definition Language (DSDL)—Part 2: Regular-grammar-based validation—RELAX NG (see http://www.y12.doe.gov/sgml/sc34/document/0458.htm).
RELAX NG schemas may be written in either XML or a compact syntax. This hack demonstrates both.
<?xml version="1.0" encoding="UTF-8"?> <!-- a time instant --> <time timezone="PST"> <hour>11</hour> <minute>59</minute> <second>59</second> <meridiem>p.m.</meridiem> <atomic signal="true"/> </time>
Here is a RELAX NG schema for time.xml called time.rng:
<element name="time" xmlns="http://relaxng.org/ns/structure/1.0"> <attribute name="timezone"/> <element name="hour"><text/></element> <element name="minute"><text/></element> <element name="second"><text/></element> <element name="meridiem"><text/></element> <element name="atomic"> <attribute name="signal"/> </element> </element>
At a glance, you can immediately tell how simple the syntax is. Each
element is defined with an element
element, and
each attribute with an attribute
element. The
namespace URI for RELAX NG is
http://relaxng.org/ns/structure/1.0
. The document
element in this schema happens to be element
, but
any element in RELAX NG that defines a pattern may be used as a
document element (grammar
may also be used, even
though it doesn’t define a pattern). Each of the
elements and attributes defined in this schema has text content, as
indicated by the text
element for elements and by
default for attributes; for example, <attribute name="signal"/>
and <attribute name="signal"><text/></attribute>
are
equivalent.
Y ou can validate documents with RELAX NG using xmllint [Hack #9] ). To validate time.xml against time.rng, type this command in a shell:
xmllint --relaxng time.rng time.xml
The response upon success will be:
<?xml version="1.0" encoding="UTF-8"?>
<!-- a time instant -->
<time timezone="PST">
<hour>11</hour>
<minute>59</minute>
<second>59</second>
<meridiem>p.m.</meridiem>
<atomic signal="true"/>
</time>
time.xml validates
xmllint mirrors the well-formed document on standard output, plus on the last line it reports that the document validates (emphasis added). You can submit one or more XML instances at the end of the command line for validation.
You can also validate documents with RELAX NG using James Clark’s Jing (http://www.thaiopensource.com/relaxng/jing.html). You can download the latest version from http://www.thaiopensource.com/download/. To validate time.xml against time.rng, use this command:
java -jar jing.jar time.rng time.xml
When Jing is silent after this command, it means that time.xml is valid with regard to time.rng. Jing, by the way, can accept one or more instance documents on the command line.
Jing also has a Windows 32 version, jing.exe, downloadable from the same location (http://www.thaiopensource.com/download/). In my tests, jing.exe runs faster than jing.jar, as you might expect.
At a Windows command prompt, run jing.exe like this:
jing time.rng time.xml
Example 5-8 is a more complex, yet more precise, version of time.rng called precise.rng , which refines what is permitted in an instance.
Example 5-8. precise.rng
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="Time"/> </start> <define name="Time"> <element name="time"> <attribute name="timezone"> <ref name="Timezones"/> </attribute> <element name="hour"> <ref name="Hours"/> </element> <element name="minute"> <ref name="MinutesSeconds"/> </element> <element name="second"> <ref name="MinutesSeconds"/> </element> <element name="meridiem"> <choice> <value>a.m.</value> <value>p.m.</value> </choice> </element> <element name="atomic"> <attribute name="signal"> <choice> <value>true</value> <value>false</value> </choice> </attribute> </element> </element> </define> <define name="Timezones"> <!-- http://www.timeanddate.com/library/abbreviations/timezones/ --> <choice> <value>GMT</value> <value>UTC</value> <value>ACDT</value> <value>ACST</value> <value>ADT</value> <value>AEDT</value> <value>AEST</value> <value>AKDT</value> <value>AKST</value> <value>AST</value> <value>AWST</value> <value>BST</value> <value>CDT</value> <value>CEST</value> <value>CET</value> <value>CST</value> <value>CXT</value> <value>EDT</value> <value>EEST</value> <value>EET</value> <value>EST</value> <value>HAA</value> <value>HAC</value> <value>HADT</value> <value>HAE</value> <value>HAP</value> <value>HAR</value> <value>HAST</value> <value>HAT</value> <value>HAY</value> <value>HNA</value> <value>HNC</value> <value>HNE</value> <value>HNP</value> <value>HNR</value> <value>HNT</value> <value>HNY</value> <value>IST</value> <value>MDT</value> <value>MESZ</value> <value>MEZ</value> <value>MST</value> <value>NDT</value> <value>NFT</value> <value>NST</value> <value>PDT</value> <value>PST</value> <value>WEST</value> <value>WET</value> <value>WST</value> </choice> </define> <define name="Hours"> <data type="string"><param name="pattern">[0-1][0-9]|2[0-3]</param></data> </define> <define name="MinutesSeconds"> <data type="integer"> <param name="minInclusive">0</param> <param name="maxInclusive">59</param> </data> </define> </grammar>
This schema uses the grammar
document element
(line 1). RELAX NG supports the XML Schema datatype library, and so
it is declared on line 2. The start
element (line
4) indicates where the instances will start; i.e., what the document
element of the instance will be. The ref
element
refers to a named definition (define
), which
starts on line 8. There are no name conflicts between named
definitions and other named structures such as
element
and attribute
. This
means that you could have a definition named time
and an element named time
with no conflicts. (I
use Time
as the name of the definition just as a
personal convention.)
The possible values for the timezone
attribute
(line 10) are defined in the Timezones
definition
(line 39). The choice
element (line 41) indicates
the content of one of the 50 enumerated
value
elements that may be used as a value for
timezone
. This technique is also used for the
content of the meridiem
element (line 22) and the
signal
attribute (line 29).
The definitions for the content of the hour
,
minute
, and second
elements
each refer to a definition. The hour
element
refers to the Hours
definition (line 95). The
data
element points to the XML Schema type
string
(line 96). This string is constrained by
the param
element whose name is
pattern
(answerable to the XML Schema facet
pattern
). The regular expression
[0-1][0-9]|2[0-3]
indicates that the content of
these elements must be two consecutive digits, the first in the range
00
through 19
([0-1][0-9]
) and the second in the range
20
through 23
(2[0-3]
). The elements minute
and second
both refer to the definition
MinutesSeconds
(line 99). Rather than use a
regular expression, this definition takes a different approach: it
uses a minInclusive
parameter of
0
(line 101) and a maxInclusive
of 59
(line 102).
Test precise.rng by validating time.xml against it with xmllint:
xmllint --relaxng precise.rng time.xml
Or with Jing:
java -jar jing.jar -c precise.rng time.xml
Or with jing.exe:
jing -c precise.rng time.xml
RELAX NG’s non-XML compact syntax is a pleasure to use (http://www.oasis-open.org/committees/relax-ng/compact-20021121.html). A tutorial on the compact syntax is available (http://relaxng.org/compact-tutorial-20030326.html). Its syntax is similar to XQuery’s computed constructor syntax (http://www.w3.org/TR/xquery/#id-computedConstructors). Following is a compact version of time.rng called time.rnc (the .rnc file suffix is conventional, representing the use of compact syntax):
element time { attribute timezone { text }, element hour { text }, element minute { text }, element second { text }, element meridiem { text }, element atomic { attribute signal { text } } }
The RELAX NG namespace is assumed though not declared explicitly. The
element
, attribute
, and
text
keywords define elements, attributes, and
text content, respectively. Sets of braces ({ }
)
hold content models.
You cannot validate a document with xmllint when
using compact syntax. You can validate a document using Jing and the
-c
switch. The command looks like:
java -jar jing.jar -c time.rnc time.xml
Or with jing.exe it looks like:
jing -c time.rnc time.xml
Silence is golden with Jing. In other words, if Jing reports nothing, the document is valid.
David Tolpin has developed a validator for RELAX NG’s compact syntax; it is called RNV and is written in C (http://davidashen.net/rnv.html). It is fast and is a nice piece of work. Source is available, and you can recompile it on your platform using the make file provided or by writing your own. A Windows 32 executable version is also available. Download the latest version of either from http://ftp.davidashen.net/PreTI/RNV/.
A copy of the Windows 32 executable rnv.exe (Version 1.6.1) is available in the file archive. Validate time.xml against time.rnc using this command:
rnv -p time.rnc time.xml
The -p
option writes the file to standard output,
as shown here. Without it, only the name of the validated file is
displayed (see emphasis) when successful.
time.xml
<?xml version="1.0" encoding="UTF-8"?>
<!-- a time instant -->
<time timezone="PST">
<hour>11</hour>
<minute>59</minute>
<second>59</second>
<meridiem>p.m.</meridiem>
<atomic signal="true"/>
</time>
A nice feature of RNV is that it can check a compact schema alone,
without validating an instance. This is done with the
-c
option:
rnv -c time.rnc
As with Jing, the sound of silence means that the compact grammar is in good shape.
Example 5-9 is a more complex yet more precise version of time.rnc called precise.rnc, which is only about 25 percent as long as its counterpart precise.rng.
Example 5-9. precise.rnc
start = Time Time = element time { attribute timezone { Timezones }, element hour { Hours }, element minute { MinutesSeconds }, element second { MinutesSeconds }, element meridiem { "a.m." | "p.m." }, element atomic { attribute signal { "true" | "false" } } } Timezones = # http://www.timeanddate.com/library/abbreviations/timezones/ "GMT" | "UTC" | "ACDT" | "ACST" | "ADT" | "AEDT" | "AEST" | "AKDT" | "AKST" | "AST" | "AWST" | "BST" | "CDT" | "CEST" | "CET" | "CST" | "CXT" | "EDT" | "EEST" | "EET" | "EST" | "HAA" | "HAC" | "HADT" | "HAE" | "HAP" | "HAR" | "HAST" | "HAT" | "HAY" | "HNA" | "HNC" | "HNE" | "HNP" | "HNR" | "HNT" | "HNY" | "IST" | "MDT" | "MESZ" | "MEZ" | "MST" | "NDT" | "NFT" | "NST" | "PDT" | "PST" | "WEST" | "WET" | "WST" Hours = xsd:string { pattern = "[0-1][0-9]|2[0-3]" } MinutesSeconds = xsd:integer { minInclusive = "0" maxInclusive="59"}
The compact schema precise.rnc was generated by Trang from precise.rng (http://www.thaiopensource.com/relaxng/trang.html).
Comparing precise.rnc with
precise.rng should yield many insights into the
compact syntax. The start
symbol (line 1)
indicates where the document element begins, as does the
start
element in XML syntax. The names of
definitions (lines 2, 13, 22, and 23) are followed by equals signs
(=
), then by the patterns they represent. These
definitions are referenced by name in the content models of elements
or attributes (lines 1, 4, 5, 6, and 7). Choices of values are
separated by a vertical bar (|
) on lines 8, 10,
and 15-21, and each of the values is quoted. Comments begin with
#
(line 14) instead of beginning with
<!--
and ending with -->
.
The XML Schema datatype library is assumed, without being identified
in the schema directly. Anything prefixed with
xsd
: is assumed to be a datatype from the XML
Schema datatype library (xsd:string
on line 22 and
xsd:integer
on line 23). The
pattern
keyword on line 22 is associated with a
regular expression. The minInclusive
and
maxInclusive
keywords are parameters (facets in
XML Schema) that define an inclusive range of 0
through 59
.
Test this compact schema by validating time.xml against it with RNV:
rnv precise.rnc time.xml
with Jing:
java -jar jing.jar -c precise.rnc time.xml
or with jing.exe:
jing -c precise.rnc time.xml
Eric van der Vlist’s RELAX NG (O’Reilly) provides a complete tutorial for RELAX NG, plus a reference
If you run into problems, a good place to post questions is the RELAX NG user list: http://relaxng.org/mailman/listinfo/relaxng-user
Sun’s Multi-schema validator by Kawaguchi Kohsuke: http://wwws.sun.com/software/xml/developers/multischema/
Tenuto, a C# validator for RELAX NG: http://sourceforge.net/projects/relaxng
3.133.126.199