We have seen how to create simple datatypes that can be applied to attributes or simple type elements. It’s now time to learn how complex types can be created.
Before we start diving into complex types, I would like to reiterate the fundamental difference between simple and complex types. The simple datatypes that we saw in the previous chapters describe the content of a text node or an attribute value. They are completely independent of the other nodes and, therefore, independent of the markup. The same datatype system can be used to describe the content of any format, even if it is not XML but an RDBMS (Relational DataBase Management System), CSV (Comma Separated Values), or a fixed-sized text format.
The complex types discussed in this chapter (and, more specifically,
the complex content models) are, on the contrary, a description of
the markup structure. They use simple datatypes to describe their
leaf element nodes and attribute values, but have no other links with
simple datatypes. Keep this in mind, especially when we study the
derivation methods for complex datatypes. Even though the names (and
elements) are sometimes the same as those we’ve seen
for simple datatypes, their meaning, usage, and content models are
different. When we discuss the xs:restriction
element,
for instance, you will see that this element has a different meaning
and content model for simple types than it does for complex types.
(In fact, this element even has two different content models for
complex types, depending on its context.) Among the different content
models composing complex types, the simple and mixed content models
are special cases in which elements may have text nodes.
There is a kind of no man’s land between simple types and complex contents, where the distinction between data and markup (or datatypes and structures) becomes fuzzier for W3C XML Schema. This ambiguity is a frequent source of confusion and complexity for human readers, but also for W3C XML Schema editing software and reference guides.
W3C XML Schema has introduced many different ways of reaching your information modeling goals, and we will try to draw a global picture of the landscape to avoid getting lost! We have to make two key choices: which content model to use, and whether to create new types or to derive them from previously defined types.
Let’s go back over the definition of the content models and try to illustrate the different cases in Table 7-1. It shows the relationship between content model and child text and element nodes.
Content model |
Mixed |
Complex |
Simple |
Empty |
Child elements |
Yes |
Yes |
No |
No |
Child text |
Yes |
No |
Yes |
No |
W3C XML Schema provides two main ways to define complex types: one
for complex content models and one for simple content models. It also
offers several tricks for piggybacking the definition of mixed and
empty contents on these definitions (through a
mixed
attribute on
a complex type definition for mixed contents, and by omitting the
option to declare elements or assigning a simple content that imposes
a null value for empty contents).
Like
simple
datatypes, complex datatypes can be
either named (i.e., global) or anonymous (i.e., local). Global
definitions must have a name and be a top-level element that is
included directly in the
xs:schema
document element. The global
definitions can then be referenced directly in an element definition
using the element type
attribute; new complex types can be derived from the global
definitions. Local complex types are defined directly
where they are needed in a schema; they are anonymous (i.e., no name
attribute); and they have a local scope.
For simple datatypes, there is no choice: you cannot create new primitive datatypes and we must define them by derivation. For complex datatypes, the situation is the opposite: there are no primitive complex types, and complex types must be created before we can do any derivation. When we create our first complex types, we have the choice of defining new content models from scratch or deriving them by extension or restriction from previously defined complex types. This makes it possible for libraries of complex datatypes to be reused within a schema or between different schemas. As far as validation is concerned, these derivations do not change anything compared to simpler definitions: they allow definition of exactly the same models applying to the same instance documents. On the other hand, some applications might be able to draw conclusions from the chain of derivations.
We will start by looking at complex types containing simple content because they are closest to simple types, which we’ve seen recently, and they also provide an easier transition to the more complex world of complex contents. We will not discuss the creation and derivation of simple types, already covered in Chapter 5, but instead will focus on complex types’ simple content models (i.e., elements having only text nodes and attributes) and study how they are created and derived.
Complex types with simple content models are created by adding a list of attributes to a simple type. The operation of adding attributes to a simple type to create a simple content complex type is called an extension of the simple type. The syntax is straightforward and we have already seen examples of such creation in Chapter 4:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="string255"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
The only things that need to change here are that the definition of
the simple type cannot be directly embedded in the
xs:extension(complex content)
and that it needs to be referenced through its
base
attribute.
This same syntax, with the same meaning, can be used to create global complex types, which can be used to define elements:
<xs:complexType name="tokenWithLang"> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> <xs:element name="title" type="tokenWithLang"/>
Complex types provide a number of options for extending simple content models.
Derivation
by
extension is reserved for
complex types and has no equivalent for simple types. It increases
the number of child node elements or attributes allowed or expected
in the complex type. For simple content complex types, child elements
cannot be added and we stay with an extension that is identical to
the method used to create a simple content complex type from a simple
type. To add an attribute to the complex type
tokenWithLang
, just shown in the previous example,
we could write:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="tokenWithLang"> <xs:attribute name="note" type="xs:token"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
The
derivation by restriction of simple
content complex types is a feature at the border between the two
parts of W3C XML Schema (Part 1: Structure and Part 2: Datatypes).
It’s also very similar to the derivation by
restriction of simple datatypes, discussed in Chapter 6. The only difference between the derivations
by restriction in these two contexts is that the derivation by
restriction of a simple content complex type allows not only
restriction of the scope of the text node, but also the restriction
of the scope of the attribute. This restriction follows the same
principle as the restriction of a simple type: any instance structure
deemed valid per the restricted type must also be valid per the base
type (with the exception already mentioned for the
xs:whiteSpace
facet).
The syntax used to restrict the text child is the same as the syntax used to derive simple types by restriction. The facets are the same as well. These facets must be followed by the new list of attributes, which may have different types as long as they are derived from the types of the attributes from the base type. Attributes that are not mandatory in the base type can be specified in the new list as “prohibited,” and attributes that are not included are considered unchanged. Following are some examples of derivations that start from a simple content datatype equivalent to the content model just shown:
<xs:complexType name="tokenWithLangAndNote"> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute name="lang" type="xs:language"/> <xs:attribute name="note" type="xs:token"/> </xs:extension> </xs:simpleContent> </xs:complexType>
We can first show how to restrict the length of the text node, as we’ve done for simple types:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:restriction base="tokenWithLangAndNote"> <xs:maxLength value="255"/> <xs:attribute name="lang" type="xs:language"/> <xs:attribute name="note" type="xs:token"/> </xs:restriction> </xs:simpleContent> </xs:complexType> </xs:element>
To remove the note
attribute from the element
title
, we declare note
to be
prohibited
in the list of attributes in the
restriction:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:restriction base="tokenWithLangAndNote"> <xs:maxLength value="255"/> <xs:attribute name="lang" type="xs:language"/> <xs:attribute name="note" use="prohibited"/> </xs:restriction> </xs:simpleContent> </xs:complexType> </xs:element>
We can also restrict the
datatype
by restricting its attributes. For instance, if we want to restrict
the number of possible languages, we can do it directly in the
definition of the lang
attribute in the derived
type:
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:restriction base="tokenWithLangAndNote"> <xs:maxLength value="255"/> <xs:attribute name="lang"> <xs:simpleType> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="es"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:restriction> </xs:simpleContent> </xs:complexType> </xs:element>
Despite apparent similarities, derivations by extension and restriction do not have much more in common than deriving new simple content types from base types! Derivation by extension can only add new attributes. It can neither change the datatype of the text node nor the type of an attribute defined in its base type. Derivation by restriction appears to be more flexible and can restrict the datatype of the text node and of the attributes of the base type. It can also remove attributes that are not mandatory in its base type.
Restricting or extending simple content models is useful, but XML is not very useful without more complex models.
Complex contents are created by defining the list (and order) of its elements and attributes. We have already seen a couple of examples of complex content models, defined as local complex types in Chapter 1 and Chapter 2:
<xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
These examples show the basic structure of a complex type with
complex content definition: the xs:complexType
element is holding the definition. Here, this definition is local
(xs:complexType
is not top-level since it is included
under an xs:element
element) and, thus, anonymous. Under xs:complexType
, we
find the sequence of children elements (xs:sequence
) and the list of attributes.
In
these examples, the
xs:sequence
elements have a role as
“compositors” and the
xs:element
elements, which are included in xs:sequence
, play a
role of “particle.” This simple
scenario may be extended using other compositors and particles.
W3C XML Schema defines three different compositors: xs:sequence
, to define ordered lists of particles;
xs:choice
, to
define a choice of one particle among several; and
xs:all
, to
define nonordered list of particles. The xs:sequence
and xs:choice
compositors can define their own
number of occurrences using
minOccurs
and
maxOccurs
attributes and they can be used as
particles (some important restrictions apply to xs:all
, which cannot be used as a particle, as we
will see in the next section).
The particles are xs:element
, xs:sequence
, xs:choice
, plus xs:any
and xs:group
, which we
will see later in the section. The ability to include compositors
within compositors is key to defining complex structures, although it
is unfortunately subject to the allergy of W3C XML Schema for
“nondeterminism.”
To give an idea of the kind of structures that can be defined,
let’s suppose that the names in our library may be
expressed in two different ways: either as a name
element, as we have shown up to now, or as three different elements
to define the first, middle, and last name (the middle name should be
optional). Names could then be expressed as one of the three
following combinations:
<first-name> Charles </first-name> <middle-name> M </middle-name> <last-name> Schulz </last-name>
or:
<first-name> Peppermint </first-name> <last-name> Patty </last-name>
or:
<name> Snoopy </name>
To describe this, we will replace the reference to the
name
element with a choice between either a
name
element or a sequence of
first-name
, middle-name
(optional), and last-name
. The definition of
author then becomes:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="first-name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
The name
element also appears in the
character
element, and a copy/paste can be used to
replace it with the xs:choice
structure, but we
would rather take this opportunity to introduce a new feature that is
very handy to manipulating reusable sets of elements.
Element
and
attribute
groups are containers in which sets of
elements and attributes may be embedded and manipulated as a whole.
These simple and flexible structures are very convenient for defining
bits of
content
models that can be reused in multiple locations, such as the xs:choice
structure that we created for our name.
The first step is to define the element group. The definition needs
to be named and global (i.e., immediately under the
xs:schema
element) and has the following form:
<xs:group name="name"> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="first-name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group>
These groups can then be used by reference as particles within compositors:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:group ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:group ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
Groups of attributes can be created in the same way using
xs:attributeGroup
:
<xs:attributeGroup name="bookAttributes"> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> </xs:attributeGroup> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="bookAttributes"/> </xs:complexType> </xs:element>
Let’s
try a new example to illustrate one of
the most constraining limitations of W3C XML Schema. We may want to
describe all the pages of our books and to have a different
description using different elements, such as
odd-page
and even-page
for odd
and even pages that require a different pagination. We can try to
describe the new content model in the following group:
<xs:group name="pages"> <xs:sequence> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element ref="odd-page"/> <xs:element ref="even-page"/> </xs:sequence> <xs:element ref="odd-page" minOccurs="0"/> </xs:sequence> </xs:group>
This seems like a simple, smart way to describe the sequences of odd and even pages: a sequence of odd and even pages eventually followed by a last odd page. The model covers books with an odd or even number of pages as well as tiny booklets with a single page. Neither XSV not Xerces appear to enjoy it, though:
XSV: vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd first-ambigous.xml using xsv (default) <?xml version='1.0'?> <xsv docElt='{None}library' instanceAssessed='true' instanceErrors='0' rootType='[Anonymous]' schemaDocs='first-ambigous.xsd' schemaErrors='1' target='/home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous.xml' validation='strict' version='XSV 1.203.2.20/1.106.2.11 of 2001/11/01 17:07:43' xmlns='http://www.w3.org/2000/05/xsv'> <schemaDocAttempt URI='/home/vdv/w3c-xml-schema/user/examples/complex-types/first- ambigous.xsd' outcome='success' source='command line'/> <schemaError char='7' line='65' phase='instance' resource='file:///home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous. xsd'> non-deterministic content model for type None: {None}:odd-page/{None}:odd-page </schemaError> </xsv> Xerces: vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd -p xerces-cvs first-ambigous.xml using xerces-cvs startDocument [Error] first-ambigous.xml:2:10: Error: cos-nonambig: (,odd-page) and (,odd-page) violate the "Unique Particle Attribution" rule. endDocument
Misled by the apparent flexibility of construction with compositors and particles, we violated an ancient taboo known in SGML as "ambiguous content models,” which was imported into XML’s DTDs as "nondeterministic content models,” and preserved by W3C XML Schema as the “Unique Particle Attribution Rule.”
In practice, this rule adds a significant
amount of complexity to writing a W3C XML Schema, since it must be
matched after all the many features, which allow you to define,
redefine, derive, import, reference, and substitute complex types,
have been resolved by the schema processor. The Recommendation
recognizes that “given the presence of element
substitution groups and wildcards, the concise expression of this
constraint is difficult.” When these features have
been resolved, the remaining constraint requires that a schema
processor should never have any doubt about which branch it is in
while doing the validation of an element and looking only at this
element. Applied to the previous example, which was as simple as
possible, there is a problem. When a schema processor meets the first
odd-page
element, it has no way of knowing if the
page will be followed by an even-page
element
without first looking ahead to the next element. This is a violation
of the Unique Particle Attribution Rule.
This example, adapted from an
example describing a chess board, is one of the famous instances in
which the content model cannot be written in a
“deterministic” way. This is not
always the case, and many nondeterministic constructions describe
content models that may be rewritten in a deterministic fashion. We
should differentiate those that are fundamentally nondeterministic
from those that are only
“accidentally” nondeterministic.
Let’s go back to our example with a
“name” sequence that can have two
different content models, and imagine that instead of using
first-name
, we reused the name
name
. The content model is now either
name
or a sequence of name
,
“middle-name,” and
“last-name”:
<xs:group name="name"> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group> <xs:element name="author"> <xs:complexType> <xs:sequence> <xs:group ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
Here again, when the processor meets a name
element, it has no way of knowing (without looking ahead) if this
element matches the first or the second branch of the choice. In this
case, though, the content model may be simplified if we note that the
name
element is common to both branches and that,
in fact, we now have a mandatory name
element
followed by an optional sequence of an optional
middle-name
and a mandatory
last-name
. The content model can then be rewritten
in a deterministic way as:
<xs:group name="name"> <xs:sequence> <xs:element ref="name"/> <xs:sequence minOccurs="0"> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:sequence> </xs:group>
This is a slippery path, though, which frequently depends on slight
nuances in the content model and leads to schemas that are very
difficult to maintain and may require nonsatisfactory compromises. If
the requirement for the content model we have just written is changed
and the name
element in the second branch is no
longer mandatory, then we are in trouble. The new content model is as
follows:
<xs:group name="name"> <xs:choice> <xs:element ref="name"/> <xs:sequence> <xs:element ref="name" minOccurs="0"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group>
But this model is nondeterministic for the same reason that the previous one was, and we need to reevaluate the different possible combinations to find that the new content model can now be expressed as:
<xs:group name="name"> <xs:choice> <xs:sequence> <xs:element ref="name"/> <xs:sequence minOccurs="0"> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:sequence> <xs:sequence> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:sequence> </xs:choice> </xs:group>
Formal theories and algorithms can rewrite nondeterministic content models in a deterministic way when possible. Hopefully, W3C XML Schema development tools will integrate some of these algorithms to propose an alternative when a schema author creates nondeterministic content models.
Ambiguous content models were already a controversial issue in the 90s among the SGML community, and the restriction has been maintained in XML DTDs under the name “nondeterministic content models” despite the dissent of Tim Bray, Jean Paoli, and Peter Sharpe, three influential members of the XML Special Interest Group who wanted to maintain a compatibility with SGML parsers. The motivation to maintain the restriction in W3C XML Schema is to keep schema processors simple to implement and to allow implementations through finite state machines (FSM). The execution time of these automatons could grow exponentially when the Unique Particle Attribution Rule is violated. This decision has been heavily criticized by experts including Joe English, James Clark, and Murata Makoto, who have proved that other simple algorithms might be used that keep the processing time linear when this rule is not met. This is also one of the main differences between the descriptive powers of schema languages, such as RELAX, TREX, and RELAX NG, which do not impose this rule, and W3C XML Schema.
Although not related, strictly speaking, the Unique Particle Attribution Rule and the Consistent Declaration Rule are often associated, since, in practice, when the Consistent Declaration Rule is violated, the Unique Particle Attribution Rule is often violated too. This new rule is much easier to explain and understand, since it only states that W3C XML Schema explicitly forbids choices between elements with the same name and different types, such as in the following:
<xs:choice> <xs:element name="name" type="xs:string"/> <xs:element name="name"> <xs:complexType> <xs:sequence> <xs:element ref="first-name"/> <xs:element ref="middle-name"/> <xs:element ref="last-name"/> </xs:sequence> </xs:complexType> </xs:element> </xs:choice>
We will see a workaround using the xsi:type
attribute, which may be used by some applications, in Chapter 11.
While useful, unordered content models have their own sets of limitations.
xs:all
Unordered content models (i.e., content models that
do not impose any order on the children elements) not only increase
the risks of nondeterministic content models, but are also an
important complexity factor for schema processors. For the sake of
implementation simplicity, the Recommendation has imposed huge
limitations on the xs:all
element, which makes
it hardly usable in practice. xs:all
cannot be
used as a particle, but as a compositor only; xs:all
cannot have a number of occurrences greater
than one; the particles included within xs:all
must be xs:element
; and these particles must not
specify numbers of occurrences greater than one.
To illustrate these limitations, let’s imagine we
have decided to simplify the life of document producers and want to
create a vocabulary that doesn’t care about the
relative order of children elements. With a simple vocabulary such as
the one defined in our first schema, this wouldn’t
add a big burden to the applications handling our vocabulary. When
you think about it, there is no special reason to impose the
definition of the title of a book after its ISBN number or the
definition of the list of authors before the list of characters.The
first content model that may be affected by this decision is the
content model of the book
element:
<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element>
Unfortunately, here the xs:sequence
cannot be
replaced by xs:all
, since two of the children
elements (author
and character
)
have a maximum number of occurrences that is
“unbounded” and thus higher than
one. The second group of candidates includes the content models of
author
and character
, which are
relatively similar:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
The good news here is that both author
and
character
match the criteria for xs:all
, so we can write:
<xs:element name="author"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
We can have two elements (author
and
character
) in which the order of children elements
is not significant. One may question, though, whether this is very
interesting since this independence is not consistent throughout the
schema. More importantly, we must note that we have lost a great deal
of flexibility and extensibility by using a xs:all
compositor. Since the maximum number of
occurrences for each child element needs to be one, we can no longer,
for instance, change the number of occurrences of the
qualification
element to accept several
qualifications in different languages. And since the particles used
in xs:all
cannot be compositors or groups, we
can’t extend the content model to accept both
name
and the sequence
first-name
, middle-name
, and
last-name
either.
Since xs:all
appears to be pretty ineffective
in general, there are a couple of workarounds that may be proposed
for people who would like to develop order-independent vocabularies.
The first workaround, which may be used
only if you are creating your own vocabulary from scratch, is to
adapt the structures of your document to the constraint of xs:all
. In practice, this means that each time we
have to use a xs:choice
, a xs:sequence
, or include elements with more than one
occurrence, we will add a new element as a container. For instance,
we will create containers named authors
and
characters
that will encapsulate the multiple
occurrences of author
and
character
. The result is instance documents such
as:
<?xml version="1.0"?> <library> <book id="b0836217462" available="true"> <title lang="en"> Being a Dog Is a Full-Time Job </title> <isbn> 0836217462 </isbn> <authors> <author id="CMS"> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> <name> Charles M Schulz </name> </author> </authors> <characters> <character id="PP"> <name> Peppermint Patty </name> <qualification> bold, brash and tomboyish </qualification> <born> 1966-08-22 </born> </character> <character id="Snoopy"> <born> 1950-10-04 </born> <name> Snoopy </name> <qualification> extroverted beagle </qualification> </character> <character id="Schroeder"> <qualification> brought classical music to the Peanuts strip </qualification> <name> Schroeder </name> <born> 1951-05-30 </born> </character> <character id="Lucy"> <name> Lucy </name> <born> 1952-03-03 </born> <qualification> bossy, crabby and selfish </qualification> </character> </characters> </book> </library>
This instance document defined by a full schema, which could be:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="name" type="xs:token"/> <xs:element name="qualification" type="xs:token"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:element name="isbn" type="xs:NMTOKEN"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="xs:language"/> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:all> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="authors"/> <xs:element ref="characters"/> </xs:all> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> <xs:element name="characters"> <xs:complexType> <xs:sequence> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> </xs:schema>
This adaptation of the instance document will be more painful if we
want to implement our alternative
“name” content model. Since we
cannot include a xs:choice
in a xs:all
compositor, we have to add a first level of
container, which is always the same, and a second level of container,
which contains only the choice that would lead to instance documents
such as:
<?xml version="1.0"?> <library> <book id="b0836217462" available="true"> <title lang="en"> Being a Dog Is a Full-Time Job </title> <isbn> 0836217462 </isbn> <authors> <author id="CMS"> <born> 1922-11-26 </born> <dead> 2000-02-12 </dead> <name> <complex-name> <last-name> Schulz </last-name> <first-name> Charles </first-name> <middle-name> M </middle-name> </complex-name> </name> </author> </authors> <characters> <character id="PP"> <name> <complex-name> <first-name> Peppermint </first-name> <last-name> Patty </last-name> </complex-name> </name> <qualification> bold, brash and tomboyish </qualification> <born> 1966-08-22 </born> </character> <character id="Snoopy"> <born> 1950-10-04 </born> <name> <simple-name> Snoopy </simple-name> </name> <qualification> extroverted beagle </qualification> </character> <character id="Schroeder"> <qualification> brought classical music to the Peanuts strip </qualification> <name> <simple-name> Schroeder </simple-name> </name> <born> 1951-05-30 </born> </character> <character id="Lucy"> <name> <simple-name> Lucy </simple-name> </name> <born> 1952-03-03 </born> <qualification> bossy, crabby and selfish </qualification> </character> </characters> </book> </library>
The adaptation of the schema is then straightforward and could be (keeping a flat design):
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="simple-name" type="xs:token"/> <xs:element name="first-name" type="xs:token"/> <xs:element name="middle-name" type="xs:token"/> <xs:element name="last-name" type="xs:token"/> <xs:element name="qualification" type="xs:token"/> <xs:element name="born" type="xs:date"/> <xs:element name="dead" type="xs:date"/> <xs:element name="isbn" type="xs:NMTOKEN"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="xs:language"/> <xs:element name="name"> <xs:complexType> <xs:choice> <xs:element ref="simple-name"/> <xs:element ref="complex-name"/> </xs:choice> </xs:complexType> </xs:element> <xs:element name="complex-name"> <xs:complexType> <xs:all> <xs:element ref="first-name"/> <xs:element ref="middle-name" minOccurs="0"/> <xs:element ref="last-name"/> </xs:all> </xs:complexType> </xs:element> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:token"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:all> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="authors"/> <xs:element ref="characters"/> </xs:all> <xs:attribute ref="id"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element> <xs:element name="characters"> <xs:complexType> <xs:sequence> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:all> <xs:attribute ref="id"/> </xs:complexType> </xs:element> </xs:schema>
This process may be generalized and used for purposes other than
adapting instance documents to the constraints of xs:all
. It is interesting to note that we have
“externalized” the complexity,
which was previously hidden from the instance document in the schema,
to bring the full structure of the content model into the instance
document itself. The choices and sequences (an element with multiple
occurrences is nothing more than an implicit sequence) are now
expressed through containers in the instance documents. Since the
structure is more apparent in the instance documents, it can be
considered more readable; some people find it a good practice to use
such
container.
xs:choice
instead of xs:all
When it is not possible or not practical to
adapt the structure of a document to the limitations of xs:all
, another workaround that may be used is to
replace xs:all
compositors by xs:choice
, when possible. This trick is far less generic
than the adaptation of structures we just saw, and it may be
surprising that two compositors with a very different meaning could
be “interchanged.” This applies
only when a loose control on the number of occurrences can be
applied, such as in a container that accepts both
author
and character
elements
in any order with any number of occurrences. Such a container can be
defined as:
<xs:element name="persons"> <xs:complexType> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> </xs:choice> </xs:complexType> </xs:element>
This definition has the same meaning as the following xs:all
definition, which is forbidden:
<xs:element name="persons"> <xs:complexType> <xs:all> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:all> </xs:complexType> </xs:element>
Complex contents can also be derived, by extension or by restriction, from complex types. Before we see the details of these mechanisms, note that they are not symmetrical and their semantic is very different. The derivation of a complex content by restriction is a restriction of the set of matching instances. All the instance structures that match the restricted complex type must also match the base complex type. The derivation of a complex content by extension of a complex type is an extension of the content model by addition of new particles. A content that matches the base type does not necessarily match the extended complex type. This also means that there is no “roundtrip”: in the general case, neither a restricted complex type nor an extended type can be extended or restricted back into its base type.
Derivation
by
extension is similar to the extension of simple content complex
types. It is functionally very similar to joining groups of elements
and attributes to create a new complex type. The idea behind this
feature is to let people add new elements and attributes after those
already defined in the base type. This is virtually equivalent to
creating a sequence with the current content model followed by the
new content model. Let’s go back to our library to
illustrate this. The content models of our elements
author
and character
are
relatively similar: author
expects
name
, born
, and
dead
, while character
expects
name
, born
, and
qualification
. If we want to use a derivation by
extension, we can first create a base type that contains the first
elements common to the content model of both elements:
<xs:complexType name="basePerson"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType>
It is then possible to use derivations by extension to append new
elements (dead
for author
and
qualification
for character
)
after those that have already been defined in the base type:
<xs:element name="author"> <xs:complexType> <xs:complexContent> <xs:extension base="basePerson"> <xs:sequence> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:extension base="basePerson"> <xs:sequence> <xs:element ref="qualification"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element>
Technically, the meaning of this derivation is equivalent to creating
a sequence containing the compositor used to define the base type as
well as the base type included in the xs:extension
element. Thus, the content models of these elements are similar to
the content models defined as:
<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:sequence> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:sequence> <xs:element ref="qualification"/> </xs:sequence> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> </xs:element>
This equivalence clearly shows the feature of this derivation
mechanism. As stated in the introduction of complex content
derivation mechanisms, this is not an extension of the set of valid
instance structures. An element character
, with
its mandatory qualification
, cannot have a valid
basePerson
content model but rather the merge of
two content models. This merge itself is subject to limitations: you
cannot choose the point where the new content model is inserted; this
addition is always done by appending the new compositor after the one
of the base type. In our example, if the common elements
name
and born
were not the
first two elements, we couldn’t have used a
derivation by extension.
Another caveat in derivations by extension is we
can’t choose the compositor that is used to merge
the two content models. This means that when we derive content models
using xs:choice
as compositors, it is not the scope of the choices that is extended,
but rather the choices that are included in a xs:sequence
. We could, for instance, extend the content
model of the element persons
, which we just
created and which could be defined as a global complex type:
<xs:complexType name="basePersons"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> </xs:choice> </xs:complexType>
If we add a new element using a derivation by extension:
<xs:complexType name="persons"> <xs:complexContent> <xs:extension base="basePersons"> <xs:sequence> <xs:element name="editor" type="xs:token" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
The result is a content type that is equivalent to:
<xs:complexType name="personsEquivalent"> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> </xs:choice> <xs:sequence> <xs:element name="editor" type="xs:token" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:sequence> </xs:complexType>
There is no way to obtain an extension of the xs:choice
such as:
<xs:complexType name="personsAsWeWouldHaveLiked"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="author"/> <xs:element ref="character"/> <xs:element name="editor" type="xs:token"/> </xs:choice> </xs:complexType>
The situation with xs:all
is even worse: the
restrictions on the composition of xs:all
still
apply. This means you can’t add any content to a
complex type defined with a xs:all
—although you can still add new
attributes—and also you can only use a xs:all
compositor in a derivation by extension if the
base type has an empty content
model.
Whereas
derivation by extension is similar to
merging two content models through a xs:sequence
compositor, derivation by restriction is a restriction of the number
of instance structures matching the complex type. In this respect, it
is similar to the derivation by restriction of simple datatypes or
simple content complex types (even though we’ve seen
that a facet such as
xs:whiteSpace
expanded the number of instance documents matching a simple type).
Note that this is the only similarity between derivations by
restriction of simple and complex datatypes. This is highly
confusing, since W3C XML Schema uses the same word and even the same
element name in both cases, but these words have a different meaning
and the content models of the xs:restriction
elements
are different.
Unlike simple type derivation, there are no facets to apply to complex types, and the derivation is done by defining the full content model of the derived datatype, which must be a logical restriction of the base type. Any instance structure valid per the derived datatype must also be valid per the base datatype. The W3C XML Schema specification does not define the derivation by restriction in these terms, but defines a formal algorithm to be followed by schema processors, which is roughly equivalent.
The derivation by restriction of a complex type is a declaration of intention that the derived type is a subset of the base type. (Rather than a derivation we’ve seen for simple types, this declaration is needed for features allowing substitutions and redefinitions of types, which we will see in Chapter 8 and Chapter 12 and which may provide useful information used by some applications.) When we derive simple types, we can take a base type without having to care about the details of the facets that are already applied, and just add our own set of facets. Here, on the contrary, we need to provide a full definition of a content model, except for attributes that can be declared as “prohibited” to be excluded from the restriction, something we have seen for the restriction of complex types with simple contents.
Moving on, let’s try to find a base from which we
can derive both the author
and
character
elements by restriction. This time, we
can be sure that such a complex type exists since all the complex
types can be derived from an abstract xs:anyType
,
allowing any elements and attributes. In practice, however, we will
try to find the most restrictive base type that can accommodate our
needs. Since the name
and born
elements are present in both author
and
character
, with the same number of occurrences, we
can keep them as they appear. We then have two elements
(dead
and qualification
, which
appear only in one of the two elements author
and
character
). Since both author
and character
will need to be valid per the base
type, we will take both of them in the base type but make them
optional by giving them a minOccurs
attribute
equal to 0. Our base type can then be:
<xs:complexType name="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> <xs:element ref="qualification" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType>
The derivations are then done by defining the content model within a
xs:restriction
element (note that we have not
repeated the attribute declarations which are not modified):
<xs:element name="author"> <xs:complexType> <xs:complexContent> <xs:restriction base="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:restriction base="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element>
We see here that the syntax of a derivation by restriction is more
verbose than the syntax of the straight definition of the content
model. The purpose of this derivation is not to build modular
schemas, but rather to give applications that use this schema the
indication that there is some commonality between the content models,
and if they know how to handle the complex type
“person,” they can handle the
elements author
and character
.
We will see W3C XML Schema features that rely on this derivation
method in Chapter 8 and Chapter 12.
Changing the number of occurrences of particles is not the only modification that can be done during a derivation by restriction. Other operations that result in a reduction of the number of valid instance structures are also possible, such as changing a simple type to a more restrictive one or fixing values. The main constraint in this mechanism is that each particle of the derived type must be an explicit derivation of the corresponding particle of the base type. The effect of this statement is to limit the “depth” of the restrictions that can be performed in a single step, and when we need to restrict particles at a deeper level of imbrication, we may have to transform local definitions into global ones. We will see a concrete example in Section 7.5.1, which are similar in this respect.
We
now
have all the elements we need to look back at the claim about the
asymmetry of these derivation methods. This lack of symmetry is not a
defect as such, but studying it is a good exercise to understanding
the meaning of these two derivation methods. Let’s
examine the derivation by extension of basePerson
into the character
element:
<xs:complexType name="basePerson"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:extension base="basePerson"> <xs:sequence> <xs:element ref="qualification"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element>
The content model of character
contains a
mandatory qualification
element. Valid characters
are not valid per basePerson
; thus, there is no
hope to be able to derive character back into
basePerson
by restriction, since all the instance
structures that are valid per the derived type must be valid per the
base type in a derivation by restriction.
Let’s look back at the derivation by restriction of
the person
base type into a
character
element:
<xs:complexType name="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> <xs:element ref="qualification" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:complexType> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:restriction base="person"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element>
Again, it is not possible to derive the complex type of
character
into person
, since it
means changing the number of minimum occurrences of
qualification
from 1 to 0 and adding an optional
dead
element between born
and
qualification
. None of these operations are
possible during a derivation by extension, which can only append new
content after the content of the base type, and
can’t update an existing particle (to change the
number of occurrences) nor insert a new particle between two
existing
particles.
Although W3CXML Schema permits mixed content models and describes them better than in XML DTDS, W3CXML Schema treats them as an add-on plugged on top of complex content models. The good news is that this allows control of children elements exactly as we’ve just seen for complex contents. The bad news is that we abandon any control over the child text nodes whose values cannot be constrained at all, and, of course, the descriptions of the child elements are subject to the same limitations as in the case of complex content models. The limitations on unordered content models are probably even more unfriendly for mixed content models, which are more “free style,” than the limitation is for complex content models.
This
add-on is implemented
through a mixed
attribute in the
xs:complexType(global definition)
,
which is otherwise used exactly as we’ve seen for
complex content models. The effect of this attribute when its value
is set to "true"
is to allow any text nodes within
the content model, before, between, and after the child elements. The
location, the whitespace processing, and the datatype of these text
nodes cannot be restricted in any way.
Let’s go back to the definition of our
title
element and change it to accept a reduced
version of XHTML with the a
link and an
em
element to highlight some parts of its text.
The definition, which was previously done by extending a simple type
to create a simple content complex type, needs to be re-written as a
complex content definition with a mixed attribute set to
"true"
. The full definition, including the
definition of the a
element, the definition of a
markedText
complex type and its usage to define
the title
element, could be:
<xs:element name="a"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="href" type="xs:anyURI"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:complexType name="markedText" mixed="true"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="em" type="xs:token"/> <xs:element ref="a"/> </xs:choice> <xs:attribute ref="lang"/> </xs:complexType> <xs:element name="title" type="markedText"/>
This definition matches elements such as:
<title lang="en"> Being a <a href="http://dmoz.org/Shopping/Pets/Dogs/"> Dog </a> Is a <em> Full-Time </em> Job </title>
Note that the length of the title can no longer be restricted.
Mixed content models are derived exactly like the complex content models on which they have been plugged. The semantic of both methods stays exactly the same.
Mixed
contents
complex types can be derived by extension from other complex content
complex types and the meaning will be the same. If I want to add a
strong
element to my markedText
mixed content type, I can define the following content model:
<xs:element name="title"> <xs:complexType mixed="true"> <xs:complexContent mixed="true"> <xs:extension base="markedText"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="strong" type="xs:string"/> </xs:choice> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element>
One must note, though, that this extension is equivalent to:
<xs:complexType name="resultingType" mixed="true"> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="em" type="xs:token"/> <xs:element ref="a"/> </xs:choice> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="strong" type="xs:string"/> </xs:choice> </xs:sequence> <xs:attribute ref="lang"/> </xs:complexType>
This is probably not what we would like to see in practice since this
content model expects to see all the occurrences of
a
and em
before any instance of
strong
. We will see later, in Chapter 12, that this specific issue can be solved using
a feature named “substitution
groups” instead of using xs:choice
.
The
derivation of mixed content models by
restriction is also done using the method defined for complex content
models, with the same constraint that each particle must be an
explicit derivation of the corresponding particle of the base type.
To illustrate the consequences of this constraint,
let’s look again at the definition and the use of
our markedText
:
<xs:element name="a"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="href" type="xs:anyURI"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:complexType name="markedText" mixed="true"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="em" type="xs:token"/> <xs:element ref="a"/> </xs:choice> <xs:attribute ref="lang"/> </xs:complexType> <xs:element name="title" type="markedText"/>
If we want to forbid em
elements in our title,
force the href
to be an http absolute URI, and
require the lang
attribute to be either
en
or es
, we need to do some
refactoring to show that the a
element included in
our title is an explicit derivation of the general definition of
a
. We also need to use a global complex type
definition for a
instead of the previous anonymous
definition:
<xs:element name="a" type="link"/>
We can now either derive a new global complex type from the new
link
complex type or embed its derivation in the
definition of our title
element:
<xs:element name="title"> <xs:complexType mixed="true"> <xs:complexContent mixed="true"> <xs:restriction base="markedText"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="a"> <xs:complexType> <xs:simpleContent> <xs:restriction base="link"> <xs:attribute name="href"> <xs:simpleType> <xs:restriction base="xs:anyURI"> <xs:pattern value="http://.*"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:restriction> </xs:simpleContent> </xs:complexType> </xs:element> </xs:choice> <xs:attribute name="lang"> <xs:simpleType> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="es"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element>
This example is a caricature. In practice it would be more readable to create an intermediate global type definition to avoid embedding several derivations, but it provides an overview of this derivation process.
Since complex and mixed content models are built using the same mechanism, one may wonder what the possibilities are for deriving complex contents from mixed contents and vice versa. The answer to this question lurks in the semantic of these two derivation methods.
Derivation by extension appends new content after the content of the base type and the structure of the base type is kept unchanged. It is therefore not possible to derive a mixed content model from complex content model. When a content model is mixed, the position of the text nodes cannot be constrained, and this permits text nodes within the base type at any location. For the same reason, it is impossible to extend a mixed content model into a complex content model because the text nodes that are allowed in the base type would become forbidden.
Derivation by restriction defines a subset of the base type. It is
forbidden to derive a mixed content model from a complex content
model. The resulting type would allow text nodes that are forbidden in
the base type and would expand rather than restrict the content
model. There is one workable possibility, however. The last
combination is the only possible one: a mixed content model can be
restricted into a complex content model. Forbidding the text nodes of
a mixed content model is a valid restriction and can be done by
setting the mixed
attribute to
“false” in the xs:complexType
definition. It is even possible to derive a
simple content model into a mixed content model since this is, in
fact, a restriction removing the sibling elements and keeping the
text nodes. This assumes, of course, that the sibling elements are
optional; i.e., they have a minOccurs
attribute
equal to 0.
Empty content models are elements that can only accept attributes. W3C XML Schema does not include any special support for empty content models, which can be considered either complex content models without elements or simple content models with a value restricted to the null string.
W3C
XML Schema considers empty
content models to be the intersection between complex content models
(in the case in which no compositors are specified) and simple
content models (in the case in which no text nodes are expected,
which W3C XML Schema handles as if an empty text node was found). We
will, therefore, be able to choose between the two methods to create
an empty content model. Where we extended our
title
element to become mixed content, we
carefully avoided adding empty elements, such as the HTML
img
or br
.
Let’s see how we could define a
br
element with its id
and
class
attributes using both methods.
This is done by defining a simple
type that can only accept the empty string as a value. Strictly
speaking, empty content models do not accept any whitespace between
their start and end tags. Since we want to control this, we must use
a datatype that does not alter the whitespaces, i.e.,
xs:string
. Our empty content model is then
derived by extension from this simple type:
<xs:simpleType name="empty"> <xs:restriction base="xs:string"> <xs:enumeration value=""/> </xs:restriction> </xs:simpleType> <xs:element name="br"> <xs:complexType> <xs:simpleContent> <xs:extension base="empty"> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="class" type="xs:NMTOKEN"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
Each of the two empty content types keeps the derivation methods of its content model (simple or complex). The main difference between these two methods is essentially a matter of which derivations may be applied on the base type and what effect it will have.
If we try to remember and compare what we’ve learned about deriving complex and simple contents by extension, we can see that both allow addition of new attributes to the complex type. However, while we can add new subelements to complex content, we cannot change the type of the text node for a simple content model. Thus, this is the first difference between the two methods: when the empty content model is built on a simple type, it will not be possible to add anything other than attributes, while if it is built on top of a complex type, it will be possible to extend it to accept elements.
At
first
glance, it seems that there are fewer differences here. The
restriction methods of both simple and complex contents allow the
restriction the scope of the attributes; restricting the content,
which is already empty, doesn’t seem to be very
interesting. It’s time, though, to remember what
we’ve learned about a simple type derivation facet,
which actually extends the set of valid instance documents! The
“empty” simple type that we created
to derive our empty simple content model has a base type equal to
xs:string
. When this simple type is derived
through
xs:whiteSpace
, the result may be an expansion
of the sets of valid instance structures. In our case, setting
xs:whiteSpace
to
“collapse” has the effect of
accepting any sequence of whitespaces between the start and closing
tags. This new type is not “empty,”
strictly speaking, but may be useful for some (if not for most)
applications that are normalizing the whitespaces and do not make any
difference between these two cases. Such a derivation can be done on
the simple content complex type like this:
<xs:simpleType name="empty"> <xs:restriction base="xs:string"> <xs:enumeration value=""/> </xs:restriction> </xs:simpleType> <xs:complexType name="emptyBr"> <xs:simpleContent> <xs:extension base="empty"> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="class" type="xs:NMTOKEN"/> </xs:extension> </xs:simpleContent> </xs:complexType> <xs:complexType name="allmostEmptyBr"> <xs:simpleContent> <xs:restriction base="emptyBr"> <xs:whiteSpace value="collapse"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="class" type="xs:NMTOKEN"/> </xs:restriction> </xs:simpleContent> </xs:complexType>
As we have seen, choosing a simple or complex type doesn’t make an awful lot of difference, except for extensibility. If we want to keep the possibility of adding subelements by derivation in the content model, we’d better choose an empty complex content model. However, if we want to be able to accept whitespaces in a derived type, an empty simple content model is a better bet.
We’ve covered
so much ground in this chapter that it’s not obvious
which features could be the most beneficial! This choice also depends
on external factors such as the level of W3C XML Schema support
available from the tools that will be used. For instance, some tools
that produce Java classes or binding may take advantage of complex
type derivation by restriction. This is the path we will follow for
now. We will create a complex type complex content, which will be a
superset of the content models of author
and
character
, which we will derive by restriction.
First, we can also define an empty
content model with an id
attribute, which can be
derived by extension for all the content models that have an
id
attribute:
<xs:complexType name="elementWithID"> <xs:attribute ref="id"/> </xs:complexType>
Note that we cannot use this type directly to define the
book
element, since its id
attribute is a restriction of
xs:ID
:
<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="id" type="bookID"/> <xs:attribute ref="available"/> </xs:complexType> </xs:element>
To
use our
elementWithID
complex type to define the
book
element, we need to derive by extension a
complex type corresponding to the complex type of book without the
restriction on the id
attribute. The following
code is quite verbose, but it is shown here as an exercise:
<xs:complexType name="bookTmp"> <xs:complexContent> <xs:extension base="elementWithID"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="available"/> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="book"> <xs:complexType> <xs:complexContent> <xs:restriction base="bookTmp"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="id" type="bookID"/> <xs:attribute ref="available"/> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element>
A more concise option is to derive by restriction first:
<xs:complexType name="elementWithBookID"> <xs:complexContent> <xs:restriction base="elementWithID"> <xs:attribute name="id" type="bookID"/> </xs:restriction> </xs:complexContent> </xs:complexType> <xs:complexType name="book"> <xs:complexContent> <xs:extension base="elementWithBookID"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="available"/> </xs:extension> </xs:complexContent> </xs:complexType>
Using the elementWithID
to derive by extension a
personType
, which can then be used to derive the
author
and character
elements
by restriction, is straightforward, if not concise. We have already
seen this example. The full schema is then:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="string255"> <xs:restriction base="xs:token"> <xs:maxLength value="255"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="string32"> <xs:restriction base="xs:token"> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="isbn"> <xs:restriction base="xs:NMTOKEN"> <xs:totalDigits value="10"/> <xs:pattern value="[0-9]{9}[0-9X]"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="bookID"> <xs:restriction base="xs:ID"> <xs:pattern value="b[0-9]{9}[0-9X]"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="supportedLanguages"> <xs:restriction base="xs:language"> <xs:enumeration value="en"/> <xs:enumeration value="es"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="date"> <xs:restriction base="xs:date"> <xs:pattern value="[^:Z]*"/> </xs:restriction> </xs:simpleType> <xs:element name="name" type="string32"/> <xs:element name="qualification" type="string255"/> <xs:element name="born" type="date"/> <xs:element name="dead" type="date"/> <xs:element name="isbn" type="isbn"/> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="available" type="xs:boolean"/> <xs:attribute name="lang" type="supportedLanguages"/> <xs:complexType name="elementWithID"> <xs:attribute ref="id"/> </xs:complexType> <xs:complexType name="bookTmp"> <xs:complexContent> <xs:extension base="elementWithID"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="available"/> </xs:extension> </xs:complexContent> </xs:complexType> <xs:complexType name="personType"> <xs:complexContent> <xs:extension base="elementWithID"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> <xs:element ref="qualification" minOccurs="0"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="string255"> <xs:attribute ref="lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="book" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:complexContent> <xs:restriction base="bookTmp"> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="id" type="bookID"/> <xs:attribute ref="available"/> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element> <xs:element name="author"> <xs:complexType> <xs:complexContent> <xs:restriction base="personType"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="dead" minOccurs="0"/> </xs:sequence> <xs:attribute ref="id"/> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element> <xs:element name="character"> <xs:complexType> <xs:complexContent> <xs:restriction base="personType"> <xs:sequence> <xs:element ref="name"/> <xs:element ref="born"/> <xs:element ref="qualification"/> </xs:sequence> <xs:attribute ref="id"/> </xs:restriction> </xs:complexContent> </xs:complexType> </xs:element> </xs:schema>
Since the derivation methods for complex types do not widen the scope of structures that can be defined by W3C XML Schema and are rather complex, their usage is controversial. Kohsuke Kawaguchi has published a convincing article on XML.com (http://www.xml.com/pub/a/2001/06/06/schemasimple.html) that explains how to avoid using complex type derivations without losing much in modularity.
3.22.51.241