DSDL is still a work in progress. It is a multipart specification, with each of the parts presenting a different schema language (except Part 1, which is an introduction, and Part 10, which is the description of the framework itself).
This part is a roadmap describing DSDL and introducing each of the parts.
This part covers RELAX NG; it rewrites the RELAX NG OASIS Technical Committee specification to meet the requirements of ISO publications. Its wording is more formal than the OASIS specification, but the features of the language are the same. Any RELAX NG implementation that conforms to either of these two documents is also conformant to the other.
DSDL Part 2 is now a Final Draft International Standard (FDIS); i.e., an official ISO standard.
This part of DSDL describes the next release of the rule-based schema
language known as Schematron. The current version of Schematron has
been defined by Rick Jelliffe and other contributors as a language
that expresses sets of rules as XPath expressions (or more
accurately, as XSLT expressions because XSLT functions such as
document( )
are also supported in XPath
expressions). Its home page is http://www.ascc.net/xml/schematron/.
Without going into the details of the language, a
Schematron
schema is composed of sets of rules named
patterns (these patterns
shouldn’t be confused with RELAX NG patterns). Each
pattern includes one or more rules. Each rule sets the context nodes
under which tests are performed, and each test is performed either as
an
assert
or as a
report
.
An assert
is a test that raises an error if it is
not verified, while a report
is a test that raises
an error if it is specified.
A fragment of a Schematron schema for our library could be:
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:title>Schematron Schema for library</sch:title> <sch:pattern> <sch:rule context="/"> <sch:assert test="library">The document element should be "library".</sch:assert> </sch:rule> <sch:rule context="/library"> <sch:assert test="book">There should be at least a book!</sch:assert> <sch:assert test="not(@*)">No attribute for library, please!</sch:assert> </sch:rule> <sch:rule context="/library/book"> <sch:report test="following-sibling::book/@id=@id"> Duplicated ID for this book.</sch:report> <sch:assert test="@id=concat('_', isbn)"> The id should be derived from the ISBN.</sch:assert> </sch:rule> <sch:rule context="/library/*"> <sch:assert test="self::book or self::author or self::character"> This element shouldn't be here...</sch:assert> </sch:rule> </sch:pattern> </sch:schema>
You can see from this simple example that it would be verbose to write a full schema with Schematron because it means writing a rule for each element. In this rule writing, all the individual tests that check the content model, and eventually the relative order between child elements, must be specified. You can also see that it does very well expressing what are often called business rules, such as:
<sch:assert test="@id=concat('_', isbn)">The id should be derived from the ISBN.</sch:assert>
This example checks that the id
attribute of a
book is derived from its ISBN element by adding a leading underscore.
DSDL Part 3, the next version of Schematron, will keep this structure and add still more power by allowing it to use not only XPath 1.0 expressions, but also expressions taken from other languages such as EXSLT (a standard extension library for XSLT), XPath 2.0, XSLT 2.0, and even XQuery 1.0.
Although RELAX NG provides a way to write and combine modular schemas, it is often the case that you need to validate a composite document against existing schemas that might be written using different languages; you might want, for instance, to validate XHTML documents with embedded RDF statements. In this case, you need to split your documents into pieces and validate each piece against its own schema.
The first contribution to Part 4 was an ISO specification known as RELAX Namespace by Murata Makoto. This contribution was followed by Modular Namespaces (MNS) by James Clark, and Namespace Switchboard by Rick Jelliffe. The latest contribution, Namespace Routing Language (NRL), was made by James Clark in June 2003 and builds on previous proposals. Although it is too early to say whether NRL will become DSDL Part 4, it will most likely influence it heavily. NRL is implemented in the latest versions of Jing.
The first example given in the specification (http://www.thaiopensource.com/relaxng/nrl.html) shows how NRL can validate a SOAP message containing one or more XHTML documents:
<rules xmlns="http://www.thaiopensource.com/validate/nrl"> <namespace ns="http://schemas.xmlsoap.org/soap/envelope/"> <validate schema="soap-envelope.xsd"/> </namespace> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng"/> </namespace> </rules>
This example splits the SOAP messages into two parts. The SOAP envelope is validated against the W3C XML Schema soap-envelope.xsd. The one or more XHTML documents found in the body of the SOAP message are validated against the RELAX NG schema xhtml.rng.
More advanced features are available including namespace wildcards, validation modes, open schemas, transparent namespaces, and NRL. These features seem to be able to handle the most complex cases until the basic assumption that instance documents may be split according to the namespaces of its elements and attributes is met.
The goal of this part is to define a set of primitive datatypes with their constraining facets and the mechanisms to derive new datatypes from this set. It is fair to say that it’s probably the least developed, yet most complex part of DSDL. While people agree on what shouldn’t be done, it is difficult to get beyond the criticism of existing systems such as W3C XML Schema datatypes to propose something better.
Some interesting ideas were raised during the last DSDL meeting in May 2003 that tend to converge with threads on the XML-DEV mailing list in June. This may lead to something more constructive in future DSDL meetings.
The goal of this part is basically to define a feature covering W3C
XML Schema’s xs:unique
,
xs:key
and xs:keyref
. Part 6
hasn’t had any contributions yet.
Part 7 allows you to specify which characters can be used in specific elements and attributes or within entire XML documents. The W3C note “A Notation for Character Collections for the WWW” (http://www.w3.org/TR/charcol/), is used as an input for Part 7. The first contribution is " Character Repertoire Validation for XML” (CRVX) (http://dret.net/netdret/docs/wilde-crvx-www2003.html).
A simple example of CRVX is:
<crvx xmlns="http://dret.net/xmlns/crvx10"> <restrict structure="ename aname pitarget" charrep="p{IsBasicLatin}"/> <restrict structure="ename aname" charrep="[^0-9]"/> </crvx>
In this proposal, the structure attribute contains identifiers for
element names (ename
), attribute names
(aname
), Processing Instruction targets
(pitarget
), and other XML constructions including
element and attribute contents. This example thus requires that
element and attribute names and Processing Instruction targets must
use characters from the BasicLatin block and that element and
attribute names must not use digits.
There is some overlap between Part 7 and other schema languages such
as Part 2 (RELAX NG). You need to take care that your names match the
rules defined in both places, and you can use the
data
pattern to check the content of attributes
and simple content elements. However, Part 7 gives you a more focused
way to express these rules independently of other schemas. It fills
some gaps in such constraints: RELAX NG can’t
express such constraints on name classes nor on mixed content
elements.
This section is still in development. The idea here is to allow you to add information (such as default values) to documents depending on the structure of the document. The only input considered for Part 8 so far is known as Architectural Forms, an old technology with strong adherents but limited use.
There were plenty of good things in DTDs, especially in SGML DTDs. Many people are still using them and question the need to put them in the trash and then define new schema languages to support namespaces and datatypes. DSDL Part 9 is for these people who would like to rely on years DTD experience without losing all the goodies of newer schema languages. Despite a burst of discussion in April 2002, this part hasn’t advanced yet.
Last but not least, Part 10 (formerly known as Part 1: Interoperability Framework) is the cement that lets you use the different parts from DSDL together with external tools such as XSLT, W3C XML Schema, or your favorite spell checker, to reuse an example given in the introduction to this chapter.
Here again, different contributions have been made, including my own “XML Validation Interoperability Framework” XVIF and Rick Jelliffe’s Schemachine. The latest contribution is known (and implemented) as xvif/outie (see http://downloads.xmlschemata.org/python/xvif/outie/about.xhtml).
A simple example of a xvif/outie document is:
<?xml version="1.0" encoding="utf-8"?>Declarative Document Architectures <framework> <rule> <instance> <transform transformation="normalize.xslt"/> </instance> <assert> <isValid schema="schema.rng"/> <isValid schema="schema.sch"/> </assert> </rule> </framework>
This document defines a rule that checks on the result of the XSLT transformation normalize.xslt that is applied to the instance document. This rule states that the result of the transformation must be valid for both schema.rng and schema.sch.
3.141.30.211