Chapter 7. Creating Complex Datatypes

We have seen how to create simple datatypes that can be applied to attributes or simple type elements. It’s now time to learn how complex types can be created.

Simple Versus Complex Types

Before we start diving into complex types, I would like to reiterate the fundamental difference between simple and complex types. The simple datatypes that we saw in the previous chapters describe the content of a text node or an attribute value. They are completely independent of the other nodes and, therefore, independent of the markup. The same datatype system can be used to describe the content of any format, even if it is not XML but an RDBMS (Relational DataBase Management System), CSV (Comma Separated Values), or a fixed-sized text format.

The complex types discussed in this chapter (and, more specifically, the complex content models) are, on the contrary, a description of the markup structure. They use simple datatypes to describe their leaf element nodes and attribute values, but have no other links with simple datatypes. Keep this in mind, especially when we study the derivation methods for complex datatypes. Even though the names (and elements) are sometimes the same as those we’ve seen for simple datatypes, their meaning, usage, and content models are different. When we discuss the xs:restriction element, for instance, you will see that this element has a different meaning and content model for simple types than it does for complex types. (In fact, this element even has two different content models for complex types, depending on its context.) Among the different content models composing complex types, the simple and mixed content models are special cases in which elements may have text nodes.

There is a kind of no man’s land between simple types and complex contents, where the distinction between data and markup (or datatypes and structures) becomes fuzzier for W3C XML Schema. This ambiguity is a frequent source of confusion and complexity for human readers, but also for W3C XML Schema editing software and reference guides.

Examining the Landscape

W3C XML Schema has introduced many different ways of reaching your information modeling goals, and we will try to draw a global picture of the landscape to avoid getting lost! We have to make two key choices: which content model to use, and whether to create new types or to derive them from previously defined types.

Content Models

Let’s go back over the definition of the content models and try to illustrate the different cases in Table 7-1. It shows the relationship between content model and child text and element nodes.

Table 7-1. Content models

Content model

Mixed

Complex

Simple

Empty

Child elements

Yes

Yes

No

No

Child text

Yes

No

Yes

No

W3C XML Schema provides two main ways to define complex types: one for complex content models and one for simple content models. It also offers several tricks for piggybacking the definition of mixed and empty contents on these definitions (through a mixed attribute on a complex type definition for mixed contents, and by omitting the option to declare elements or assigning a simple content that imposes a null value for empty contents).

Named Versus Anonymous Types

Like simple datatypes, complex datatypes can be either named (i.e., global) or anonymous (i.e., local). Global definitions must have a name and be a top-level element that is included directly in the xs:schema document element. The global definitions can then be referenced directly in an element definition using the element type attribute; new complex types can be derived from the global definitions. Local complex types are defined directly where they are needed in a schema; they are anonymous (i.e., no name attribute); and they have a local scope.

Creation Versus Derivation

For simple datatypes, there is no choice: you cannot create new primitive datatypes and we must define them by derivation. For complex datatypes, the situation is the opposite: there are no primitive complex types, and complex types must be created before we can do any derivation. When we create our first complex types, we have the choice of defining new content models from scratch or deriving them by extension or restriction from previously defined complex types. This makes it possible for libraries of complex datatypes to be reused within a schema or between different schemas. As far as validation is concerned, these derivations do not change anything compared to simpler definitions: they allow definition of exactly the same models applying to the same instance documents. On the other hand, some applications might be able to draw conclusions from the chain of derivations.

Simple Content Models

We will start by looking at complex types containing simple content because they are closest to simple types, which we’ve seen recently, and they also provide an easier transition to the more complex world of complex contents. We will not discuss the creation and derivation of simple types, already covered in Chapter 5, but instead will focus on complex types’ simple content models (i.e., elements having only text nodes and attributes) and study how they are created and derived.

Creation of Simple Content Models

Complex types with simple content models are created by adding a list of attributes to a simple type. The operation of adding attributes to a simple type to create a simple content complex type is called an extension of the simple type. The syntax is straightforward and we have already seen examples of such creation in Chapter 4:

<xs:element name="title">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="string255">
        <xs:attribute ref="lang"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

The only things that need to change here are that the definition of the simple type cannot be directly embedded in the xs:extension(complex content) and that it needs to be referenced through its base attribute.

This same syntax, with the same meaning, can be used to create global complex types, which can be used to define elements:

<xs:complexType name="tokenWithLang">
  <xs:simpleContent>
    <xs:extension base="xs:token">
      <xs:attribute ref="lang"/>
    </xs:extension>
  </xs:simpleContent>
</xs:complexType>
          
<xs:element name="title" type="tokenWithLang"/>

Derivation from Simple Contents

Complex types provide a number of options for extending simple content models.

Derivation by extension

Derivation by extension is reserved for complex types and has no equivalent for simple types. It increases the number of child node elements or attributes allowed or expected in the complex type. For simple content complex types, child elements cannot be added and we stay with an extension that is identical to the method used to create a simple content complex type from a simple type. To add an attribute to the complex type tokenWithLang, just shown in the previous example, we could write:

<xs:element name="title">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="tokenWithLang">
        <xs:attribute name="note" type="xs:token"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

Derivation by restriction

The derivation by restriction of simple content complex types is a feature at the border between the two parts of W3C XML Schema (Part 1: Structure and Part 2: Datatypes). It’s also very similar to the derivation by restriction of simple datatypes, discussed in Chapter 6. The only difference between the derivations by restriction in these two contexts is that the derivation by restriction of a simple content complex type allows not only restriction of the scope of the text node, but also the restriction of the scope of the attribute. This restriction follows the same principle as the restriction of a simple type: any instance structure deemed valid per the restricted type must also be valid per the base type (with the exception already mentioned for the xs:whiteSpace facet).

The syntax used to restrict the text child is the same as the syntax used to derive simple types by restriction. The facets are the same as well. These facets must be followed by the new list of attributes, which may have different types as long as they are derived from the types of the attributes from the base type. Attributes that are not mandatory in the base type can be specified in the new list as “prohibited,” and attributes that are not included are considered unchanged. Following are some examples of derivations that start from a simple content datatype equivalent to the content model just shown:

<xs:complexType name="tokenWithLangAndNote">
  <xs:simpleContent>
    <xs:extension base="xs:token">
      <xs:attribute name="lang" type="xs:language"/>
      <xs:attribute name="note" type="xs:token"/>
    </xs:extension>
  </xs:simpleContent>
</xs:complexType>

We can first show how to restrict the length of the text node, as we’ve done for simple types:

<xs:element name="title">
  <xs:complexType>
    <xs:simpleContent>
      <xs:restriction base="tokenWithLangAndNote">
        <xs:maxLength value="255"/> 
         <xs:attribute name="lang" type="xs:language"/>
        <xs:attribute name="note" type="xs:token"/>
      </xs:restriction>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

To remove the note attribute from the element title, we declare note to be prohibited in the list of attributes in the restriction:

<xs:element name="title">
  <xs:complexType>
    <xs:simpleContent>
      <xs:restriction base="tokenWithLangAndNote">
        <xs:maxLength value="255"/>
        <xs:attribute name="lang" type="xs:language"/>
        <xs:attribute name="note" use="prohibited"/>
      </xs:restriction>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

We can also restrict the datatype by restricting its attributes. For instance, if we want to restrict the number of possible languages, we can do it directly in the definition of the lang attribute in the derived type:

<xs:element name="title">
  <xs:complexType>
    <xs:simpleContent>
      <xs:restriction base="tokenWithLangAndNote">
        <xs:maxLength value="255"/>
        <xs:attribute name="lang">
          <xs:simpleType>
            <xs:restriction base="xs:language">
              <xs:enumeration value="en"/>
              <xs:enumeration value="es"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
      </xs:restriction>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

Comparison of these two methods

Despite apparent similarities, derivations by extension and restriction do not have much more in common than deriving new simple content types from base types! Derivation by extension can only add new attributes. It can neither change the datatype of the text node nor the type of an attribute defined in its base type. Derivation by restriction appears to be more flexible and can restrict the datatype of the text node and of the attributes of the base type. It can also remove attributes that are not mandatory in its base type.

Complex Content Models

Restricting or extending simple content models is useful, but XML is not very useful without more complex models.

Creation of Complex Content

Complex contents are created by defining the list (and order) of its elements and attributes. We have already seen a couple of examples of complex content models, defined as local complex types in Chapter 1 and Chapter 2:

<xs:element name="library">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="book" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

These examples show the basic structure of a complex type with complex content definition: the xs:complexType element is holding the definition. Here, this definition is local (xs:complexType is not top-level since it is included under an xs:element element) and, thus, anonymous. Under xs:complexType, we find the sequence of children elements (xs:sequence) and the list of attributes.

Compositors and particles

In these examples, the xs:sequence elements have a role as “compositors” and the xs:element elements, which are included in xs:sequence, play a role of “particle.” This simple scenario may be extended using other compositors and particles.

W3C XML Schema defines three different compositors: xs:sequence, to define ordered lists of particles; xs:choice, to define a choice of one particle among several; and xs:all, to define nonordered list of particles. The xs:sequence and xs:choice compositors can define their own number of occurrences using minOccurs and maxOccurs attributes and they can be used as particles (some important restrictions apply to xs:all, which cannot be used as a particle, as we will see in the next section).

The particles are xs:element, xs:sequence, xs:choice, plus xs:any and xs:group, which we will see later in the section. The ability to include compositors within compositors is key to defining complex structures, although it is unfortunately subject to the allergy of W3C XML Schema for “nondeterminism.”

To give an idea of the kind of structures that can be defined, let’s suppose that the names in our library may be expressed in two different ways: either as a name element, as we have shown up to now, or as three different elements to define the first, middle, and last name (the middle name should be optional). Names could then be expressed as one of the three following combinations:

<first-name>
  Charles
</first-name>
      <middle-name>
  M
</middle-name>
       <last-name>
  Schulz
</last-name>

or:

<first-name>
  Peppermint
</first-name>
      <last-name>
  Patty
</last-name>

or:

<name>
  Snoopy
</name>

To describe this, we will replace the reference to the name element with a choice between either a name element or a sequence of first-name, middle-name (optional), and last-name. The definition of author then becomes:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:choice>
        <xs:element ref="name"/>
        <xs:sequence>
          <xs:element ref="first-name"/>
          <xs:element ref="middle-name" minOccurs="0"/>
          <xs:element ref="last-name"/>
        </xs:sequence>
      </xs:choice>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

The name element also appears in the character element, and a copy/paste can be used to replace it with the xs:choice structure, but we would rather take this opportunity to introduce a new feature that is very handy to manipulating reusable sets of elements.

Element and attribute groups

Element and attribute groups are containers in which sets of elements and attributes may be embedded and manipulated as a whole. These simple and flexible structures are very convenient for defining bits of content models that can be reused in multiple locations, such as the xs:choice structure that we created for our name.

The first step is to define the element group. The definition needs to be named and global (i.e., immediately under the xs:schema element) and has the following form:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="name"/>
    <xs:sequence>
      <xs:element ref="first-name"/>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>

These groups can then be used by reference as particles within compositors:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

Groups of attributes can be created in the same way using xs:attributeGroup:

<xs:attributeGroup name="bookAttributes">
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
</xs:attributeGroup>
             
<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="isbn"/>
      <xs:element ref="title"/>
      <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> 
      <xs:element ref="character" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attributeGroup ref="bookAttributes"/>
  </xs:complexType>
</xs:element>

Unique Particle Attribution Rule

Let’s try a new example to illustrate one of the most constraining limitations of W3C XML Schema. We may want to describe all the pages of our books and to have a different description using different elements, such as odd-page and even-page for odd and even pages that require a different pagination. We can try to describe the new content model in the following group:

<xs:group name="pages">
  <xs:sequence>
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
      <xs:element ref="odd-page"/>
      <xs:element ref="even-page"/>
    </xs:sequence>
    <xs:element ref="odd-page" minOccurs="0"/>
  </xs:sequence>
</xs:group>

This seems like a simple, smart way to describe the sequences of odd and even pages: a sequence of odd and even pages eventually followed by a last odd page. The model covers books with an odd or even number of pages as well as tiny booklets with a single page. Neither XSV not Xerces appear to enjoy it, though:

XSV:

vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd 
first-ambigous.xml
using xsv (default)
<?xml version='1.0'?>
<xsv docElt='{None}library' instanceAssessed='true' instanceErrors='0' 
rootType='[Anonymous]' schemaDocs='first-ambigous.xsd' schemaErrors='1' 
target='/home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous.xml' 
validation='strict' version='XSV 1.203.2.20/1.106.2.11 of 2001/11/01 17:07:43' 
xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='/home/vdv/w3c-xml-schema/user/examples/complex-types/first-
ambigous.xsd' 
outcome='success' source='command line'/>
<schemaError char='7' line='65' phase='instance' 
resource='file:///home/vdv/w3c-xml-schema/user/examples/complex-types/first-ambigous.
xsd'>
non-deterministic content model for type None: {None}:odd-page/{None}:odd-page
</schemaError>
</xsv>

Xerces:

vdv@evlist:~/w3c-xml-schema/user/examples/complex-types$ xsd -n first-ambigous.xsd 
-p xerces-cvs first-ambigous.xml
using xerces-cvs
startDocument
[Error] first-ambigous.xml:2:10: Error: cos-nonambig: (,odd-page) 
and (,odd-page) violate the "Unique Particle Attribution" rule.
endDocument

Misled by the apparent flexibility of construction with compositors and particles, we violated an ancient taboo known in SGML as "ambiguous content models,” which was imported into XML’s DTDs as "nondeterministic content models,” and preserved by W3C XML Schema as the “Unique Particle Attribution Rule.”

In practice, this rule adds a significant amount of complexity to writing a W3C XML Schema, since it must be matched after all the many features, which allow you to define, redefine, derive, import, reference, and substitute complex types, have been resolved by the schema processor. The Recommendation recognizes that “given the presence of element substitution groups and wildcards, the concise expression of this constraint is difficult.” When these features have been resolved, the remaining constraint requires that a schema processor should never have any doubt about which branch it is in while doing the validation of an element and looking only at this element. Applied to the previous example, which was as simple as possible, there is a problem. When a schema processor meets the first odd-page element, it has no way of knowing if the page will be followed by an even-page element without first looking ahead to the next element. This is a violation of the Unique Particle Attribution Rule.

This example, adapted from an example describing a chess board, is one of the famous instances in which the content model cannot be written in a “deterministic” way. This is not always the case, and many nondeterministic constructions describe content models that may be rewritten in a deterministic fashion. We should differentiate those that are fundamentally nondeterministic from those that are only “accidentally” nondeterministic. Let’s go back to our example with a “name” sequence that can have two different content models, and imagine that instead of using first-name, we reused the name name. The content model is now either name or a sequence of name, “middle-name,” and “last-name”:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="name"/>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>
             
<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

Here again, when the processor meets a name element, it has no way of knowing (without looking ahead) if this element matches the first or the second branch of the choice. In this case, though, the content model may be simplified if we note that the name element is common to both branches and that, in fact, we now have a mandatory name element followed by an optional sequence of an optional middle-name and a mandatory last-name. The content model can then be rewritten in a deterministic way as:

<xs:group name="name">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:sequence minOccurs="0">
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:sequence>
</xs:group>

This is a slippery path, though, which frequently depends on slight nuances in the content model and leads to schemas that are very difficult to maintain and may require nonsatisfactory compromises. If the requirement for the content model we have just written is changed and the name element in the second branch is no longer mandatory, then we are in trouble. The new content model is as follows:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="name"/>
    <xs:sequence>
      <xs:element ref="name" minOccurs="0"/>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>

But this model is nondeterministic for the same reason that the previous one was, and we need to reevaluate the different possible combinations to find that the new content model can now be expressed as:

<xs:group name="name">
  <xs:choice>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:sequence minOccurs="0">
        <xs:element ref="middle-name" minOccurs="0"/>
        <xs:element ref="last-name"/>
      </xs:sequence>
    </xs:sequence>
    <xs:sequence>
      <xs:element ref="middle-name" minOccurs="0"/>
      <xs:element ref="last-name"/>
    </xs:sequence>
  </xs:choice>
</xs:group>

Note

Formal theories and algorithms can rewrite nondeterministic content models in a deterministic way when possible. Hopefully, W3C XML Schema development tools will integrate some of these algorithms to propose an alternative when a schema author creates nondeterministic content models.

Ambiguous content models were already a controversial issue in the 90s among the SGML community, and the restriction has been maintained in XML DTDs under the name “nondeterministic content models” despite the dissent of Tim Bray, Jean Paoli, and Peter Sharpe, three influential members of the XML Special Interest Group who wanted to maintain a compatibility with SGML parsers. The motivation to maintain the restriction in W3C XML Schema is to keep schema processors simple to implement and to allow implementations through finite state machines (FSM). The execution time of these automatons could grow exponentially when the Unique Particle Attribution Rule is violated. This decision has been heavily criticized by experts including Joe English, James Clark, and Murata Makoto, who have proved that other simple algorithms might be used that keep the processing time linear when this rule is not met. This is also one of the main differences between the descriptive powers of schema languages, such as RELAX, TREX, and RELAX NG, which do not impose this rule, and W3C XML Schema.

Consistent Declaration Rule

Although not related, strictly speaking, the Unique Particle Attribution Rule and the Consistent Declaration Rule are often associated, since, in practice, when the Consistent Declaration Rule is violated, the Unique Particle Attribution Rule is often violated too. This new rule is much easier to explain and understand, since it only states that W3C XML Schema explicitly forbids choices between elements with the same name and different types, such as in the following:

<xs:choice>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="name">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="first-name"/>
        <xs:element ref="middle-name"/>
        <xs:element ref="last-name"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:choice>

We will see a workaround using the xsi:type attribute, which may be used by some applications, in Chapter 11.

Limitations on unordered content models

While useful, unordered content models have their own sets of limitations.

Limitations of xs:all

Unordered content models (i.e., content models that do not impose any order on the children elements) not only increase the risks of nondeterministic content models, but are also an important complexity factor for schema processors. For the sake of implementation simplicity, the Recommendation has imposed huge limitations on the xs:all element, which makes it hardly usable in practice. xs:all cannot be used as a particle, but as a compositor only; xs:all cannot have a number of occurrences greater than one; the particles included within xs:all must be xs:element; and these particles must not specify numbers of occurrences greater than one.

To illustrate these limitations, let’s imagine we have decided to simplify the life of document producers and want to create a vocabulary that doesn’t care about the relative order of children elements. With a simple vocabulary such as the one defined in our first schema, this wouldn’t add a big burden to the applications handling our vocabulary. When you think about it, there is no special reason to impose the definition of the title of a book after its ISBN number or the definition of the list of authors before the list of characters.The first content model that may be affected by this decision is the content model of the book element:

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="isbn"/>
      <xs:element ref="title"/>
      <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> 
      <xs:element ref="character" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
    <xs:attribute ref="available"/>
  </xs:complexType>
</xs:element>

Unfortunately, here the xs:sequence cannot be replaced by xs:all, since two of the children elements (author and character) have a maximum number of occurrences that is “unbounded” and thus higher than one. The second group of candidates includes the content models of author and character, which are relatively similar:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
                
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

The good news here is that both author and character match the criteria for xs:all, so we can write:

<xs:element name="author">
  <xs:complexType>
    <xs:all>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:all>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
                
<xs:element name="character">
  <xs:complexType>
    <xs:all>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:all>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

We can have two elements (author and character) in which the order of children elements is not significant. One may question, though, whether this is very interesting since this independence is not consistent throughout the schema. More importantly, we must note that we have lost a great deal of flexibility and extensibility by using a xs:all compositor. Since the maximum number of occurrences for each child element needs to be one, we can no longer, for instance, change the number of occurrences of the qualification element to accept several qualifications in different languages. And since the particles used in xs:all cannot be compositors or groups, we can’t extend the content model to accept both name and the sequence first-name, middle-name, and last-name either.

Since xs:all appears to be pretty ineffective in general, there are a couple of workarounds that may be proposed for people who would like to develop order-independent vocabularies.

Adapting the structure of your document

The first workaround, which may be used only if you are creating your own vocabulary from scratch, is to adapt the structures of your document to the constraint of xs:all. In practice, this means that each time we have to use a xs:choice, a xs:sequence, or include elements with more than one occurrence, we will add a new element as a container. For instance, we will create containers named authors and characters that will encapsulate the multiple occurrences of author and character. The result is instance documents such as:

<?xml version="1.0"?> 
<library>
  <book id="b0836217462" available="true">
    <title lang="en">
      Being a Dog Is a Full-Time Job
    </title>
    <isbn>
      0836217462
    </isbn>
    <authors>
      <author id="CMS">
        <born>
          1922-11-26
        </born>
        <dead>
          2000-02-12
        </dead>
        <name>
          Charles M Schulz
        </name>
      </author>
    </authors>
    <characters>
      <character id="PP">
        <name>
          Peppermint Patty
        </name>
        <qualification>
          bold, brash and tomboyish
        </qualification>
        <born>
          1966-08-22
        </born>
      </character>
      <character id="Snoopy">
        <born>
          1950-10-04
        </born>
        <name>
          Snoopy
        </name>
        <qualification>
          extroverted beagle
        </qualification>
      </character>
      <character id="Schroeder">
        <qualification>
          brought classical music to the Peanuts strip
        </qualification>
        <name>
          Schroeder
        </name>
        <born>
          1951-05-30
        </born>
      </character>
      <character id="Lucy">
        <name>
          Lucy
        </name>
        <born>
          1952-03-03
        </born>
        <qualification>
          bossy, crabby and selfish
        </qualification>
      </character>
    </characters>
  </book>
</library>

This instance document defined by a full schema, which could be:

<?xml version="1.0"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="name" type="xs:token"/>
  <xs:element name="qualification" type="xs:token"/>
  <xs:element name="born" type="xs:date"/>
  <xs:element name="dead" type="xs:date"/>
  <xs:element name="isbn" type="xs:NMTOKEN"/>
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
  <xs:attribute name="lang" type="xs:language"/>
  <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:token">
          <xs:attribute ref="lang"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="book" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="authors">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="author">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="book">
    <xs:complexType>
      <xs:all>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/>
        <xs:element ref="authors"/>
        <xs:element ref="characters"/>
      </xs:all>
      <xs:attribute ref="id"/>
      <xs:attribute ref="available"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="characters">
    <xs:complexType>
      <xs:sequence> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="character">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="qualification"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

This adaptation of the instance document will be more painful if we want to implement our alternative “name” content model. Since we cannot include a xs:choice in a xs:all compositor, we have to add a first level of container, which is always the same, and a second level of container, which contains only the choice that would lead to instance documents such as:

<?xml version="1.0"?> 
<library>
  <book id="b0836217462" available="true">
    <title lang="en">
      Being a Dog Is a Full-Time Job
    </title>
    <isbn>
      0836217462
    </isbn>
    <authors>
      <author id="CMS">
        <born>
          1922-11-26
        </born>
        <dead>
          2000-02-12
        </dead>
        <name>
          <complex-name>
            <last-name>
              Schulz
            </last-name>
            <first-name>
              Charles
            </first-name>
            <middle-name>
              M
            </middle-name>
          </complex-name>
        </name>
      </author>
    </authors>
    <characters>
      <character id="PP">
        <name>
          <complex-name>
            <first-name>
              Peppermint
            </first-name>
            <last-name>
              Patty
            </last-name>
          </complex-name>
        </name>
        <qualification>
          bold, brash and tomboyish
        </qualification>
        <born>
          1966-08-22
        </born>
      </character>
      <character id="Snoopy">
        <born>
          1950-10-04
        </born>
        <name>
          <simple-name>
            Snoopy
          </simple-name>
        </name>
        <qualification>
          extroverted beagle
        </qualification>
      </character>
      <character id="Schroeder">
        <qualification>
          brought classical music to the Peanuts strip
        </qualification>
        <name>
          <simple-name>
            Schroeder
          </simple-name>
        </name>
        <born>
          1951-05-30
        </born>
      </character>
      <character id="Lucy">
        <name>
          <simple-name>
            Lucy
          </simple-name>
        </name>
        <born>
          1952-03-03
        </born>
        <qualification>
          bossy, crabby and selfish
        </qualification>
      </character>
    </characters>
  </book>
</library>

The adaptation of the schema is then straightforward and could be (keeping a flat design):

<?xml version="1.0"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="simple-name" type="xs:token"/>
  <xs:element name="first-name" type="xs:token"/>
  <xs:element name="middle-name" type="xs:token"/>
  <xs:element name="last-name" type="xs:token"/>
  <xs:element name="qualification" type="xs:token"/>
  <xs:element name="born" type="xs:date"/>
  <xs:element name="dead" type="xs:date"/>
  <xs:element name="isbn" type="xs:NMTOKEN"/>
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
  <xs:attribute name="lang" type="xs:language"/>
  <xs:element name="name">
    <xs:complexType>
      <xs:choice>
        <xs:element ref="simple-name"/>
        <xs:element ref="complex-name"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="complex-name">
    <xs:complexType>
      <xs:all>
        <xs:element ref="first-name"/>
        <xs:element ref="middle-name" minOccurs="0"/>
        <xs:element ref="last-name"/>
      </xs:all>
    </xs:complexType>
  </xs:element>
  <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="xs:token">
          <xs:attribute ref="lang"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="book" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="authors">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="author">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="book">
    <xs:complexType>
      <xs:all>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/>
        <xs:element ref="authors"/>
        <xs:element ref="characters"/>
      </xs:all>
      <xs:attribute ref="id"/>
      <xs:attribute ref="available"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="characters">
    <xs:complexType>
      <xs:sequence> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="character">
    <xs:complexType>
      <xs:all>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
        <xs:element ref="qualification"/>
      </xs:all>
      <xs:attribute ref="id"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

This process may be generalized and used for purposes other than adapting instance documents to the constraints of xs:all. It is interesting to note that we have “externalized” the complexity, which was previously hidden from the instance document in the schema, to bring the full structure of the content model into the instance document itself. The choices and sequences (an element with multiple occurrences is nothing more than an implicit sequence) are now expressed through containers in the instance documents. Since the structure is more apparent in the instance documents, it can be considered more readable; some people find it a good practice to use such container.

Using xs:choice instead of xs:all

When it is not possible or not practical to adapt the structure of a document to the limitations of xs:all, another workaround that may be used is to replace xs:all compositors by xs:choice, when possible. This trick is far less generic than the adaptation of structures we just saw, and it may be surprising that two compositors with a very different meaning could be “interchanged.” This applies only when a loose control on the number of occurrences can be applied, such as in a container that accepts both author and character elements in any order with any number of occurrences. Such a container can be defined as:

<xs:element name="persons">
  <xs:complexType>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element ref="author"/>
      <xs:element ref="character"/>
    </xs:choice>
  </xs:complexType>
</xs:element>

This definition has the same meaning as the following xs:all definition, which is forbidden:

<xs:element name="persons">
  <xs:complexType>
    <xs:all>
      <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> 
      <xs:element ref="character" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:all>
  </xs:complexType>
</xs:element>

Derivation of Complex Content

Complex contents can also be derived, by extension or by restriction, from complex types. Before we see the details of these mechanisms, note that they are not symmetrical and their semantic is very different. The derivation of a complex content by restriction is a restriction of the set of matching instances. All the instance structures that match the restricted complex type must also match the base complex type. The derivation of a complex content by extension of a complex type is an extension of the content model by addition of new particles. A content that matches the base type does not necessarily match the extended complex type. This also means that there is no “roundtrip”: in the general case, neither a restricted complex type nor an extended type can be extended or restricted back into its base type.

Derivation by extension

Derivation by extension is similar to the extension of simple content complex types. It is functionally very similar to joining groups of elements and attributes to create a new complex type. The idea behind this feature is to let people add new elements and attributes after those already defined in the base type. This is virtually equivalent to creating a sequence with the current content model followed by the new content model. Let’s go back to our library to illustrate this. The content models of our elements author and character are relatively similar: author expects name, born, and dead, while character expects name, born, and qualification. If we want to use a derivation by extension, we can first create a base type that contains the first elements common to the content model of both elements:

<xs:complexType name="basePerson">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

It is then possible to use derivations by extension to append new elements (dead for author and qualification for character) after those that have already been defined in the base type:

<xs:element name="author">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="dead" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Technically, the meaning of this derivation is equivalent to creating a sequence containing the compositor used to define the base type as well as the base type included in the xs:extension element. Thus, the content models of these elements are similar to the content models defined as:

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
      </xs:sequence>
      <xs:sequence>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="name"/>
        <xs:element ref="born"/>
      </xs:sequence>
      <xs:sequence>
        <xs:element ref="qualification"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

This equivalence clearly shows the feature of this derivation mechanism. As stated in the introduction of complex content derivation mechanisms, this is not an extension of the set of valid instance structures. An element character, with its mandatory qualification, cannot have a valid basePerson content model but rather the merge of two content models. This merge itself is subject to limitations: you cannot choose the point where the new content model is inserted; this addition is always done by appending the new compositor after the one of the base type. In our example, if the common elements name and born were not the first two elements, we couldn’t have used a derivation by extension.

Another caveat in derivations by extension is we can’t choose the compositor that is used to merge the two content models. This means that when we derive content models using xs:choice as compositors, it is not the scope of the choices that is extended, but rather the choices that are included in a xs:sequence. We could, for instance, extend the content model of the element persons, which we just created and which could be defined as a global complex type:

<xs:complexType name="basePersons">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="author"/>
    <xs:element ref="character"/>
  </xs:choice>
</xs:complexType>

If we add a new element using a derivation by extension:

<xs:complexType name="persons">
  <xs:complexContent>
    <xs:extension base="basePersons">
      <xs:sequence> 
        <xs:element name="editor" type="xs:token" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

The result is a content type that is equivalent to:

<xs:complexType name="personsEquivalent">
  <xs:sequence>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element ref="author"/>
      <xs:element ref="character"/>
    </xs:choice>
    <xs:sequence> 
      <xs:element name="editor" type="xs:token" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:sequence>
</xs:complexType>

There is no way to obtain an extension of the xs:choice such as:

<xs:complexType name="personsAsWeWouldHaveLiked">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="author"/>
    <xs:element ref="character"/>
    <xs:element name="editor" type="xs:token"/>
  </xs:choice>
</xs:complexType>

The situation with xs:all is even worse: the restrictions on the composition of xs:all still apply. This means you can’t add any content to a complex type defined with a xs:all—although you can still add new attributes—and also you can only use a xs:all compositor in a derivation by extension if the base type has an empty content model.

Derivation by restriction

Whereas derivation by extension is similar to merging two content models through a xs:sequence compositor, derivation by restriction is a restriction of the number of instance structures matching the complex type. In this respect, it is similar to the derivation by restriction of simple datatypes or simple content complex types (even though we’ve seen that a facet such as xs:whiteSpace expanded the number of instance documents matching a simple type). Note that this is the only similarity between derivations by restriction of simple and complex datatypes. This is highly confusing, since W3C XML Schema uses the same word and even the same element name in both cases, but these words have a different meaning and the content models of the xs:restriction elements are different.

Unlike simple type derivation, there are no facets to apply to complex types, and the derivation is done by defining the full content model of the derived datatype, which must be a logical restriction of the base type. Any instance structure valid per the derived datatype must also be valid per the base datatype. The W3C XML Schema specification does not define the derivation by restriction in these terms, but defines a formal algorithm to be followed by schema processors, which is roughly equivalent.

The derivation by restriction of a complex type is a declaration of intention that the derived type is a subset of the base type. (Rather than a derivation we’ve seen for simple types, this declaration is needed for features allowing substitutions and redefinitions of types, which we will see in Chapter 8 and Chapter 12 and which may provide useful information used by some applications.) When we derive simple types, we can take a base type without having to care about the details of the facets that are already applied, and just add our own set of facets. Here, on the contrary, we need to provide a full definition of a content model, except for attributes that can be declared as “prohibited” to be excluded from the restriction, something we have seen for the restriction of complex types with simple contents.

Moving on, let’s try to find a base from which we can derive both the author and character elements by restriction. This time, we can be sure that such a complex type exists since all the complex types can be derived from an abstract xs:anyType, allowing any elements and attributes. In practice, however, we will try to find the most restrictive base type that can accommodate our needs. Since the name and born elements are present in both author and character, with the same number of occurrences, we can keep them as they appear. We then have two elements (dead and qualification, which appear only in one of the two elements author and character). Since both author and character will need to be valid per the base type, we will take both of them in the base type but make them optional by giving them a minOccurs attribute equal to 0. Our base type can then be:

<xs:complexType name="person">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
    <xs:element ref="qualification" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

The derivations are then done by defining the content model within a xs:restriction element (note that we have not repeated the attribute declarations which are not modified):

<xs:element name="author">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="dead" minOccurs="0"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

We see here that the syntax of a derivation by restriction is more verbose than the syntax of the straight definition of the content model. The purpose of this derivation is not to build modular schemas, but rather to give applications that use this schema the indication that there is some commonality between the content models, and if they know how to handle the complex type “person,” they can handle the elements author and character. We will see W3C XML Schema features that rely on this derivation method in Chapter 8 and Chapter 12.

Changing the number of occurrences of particles is not the only modification that can be done during a derivation by restriction. Other operations that result in a reduction of the number of valid instance structures are also possible, such as changing a simple type to a more restrictive one or fixing values. The main constraint in this mechanism is that each particle of the derived type must be an explicit derivation of the corresponding particle of the base type. The effect of this statement is to limit the “depth” of the restrictions that can be performed in a single step, and when we need to restrict particles at a deeper level of imbrication, we may have to transform local definitions into global ones. We will see a concrete example in Section 7.5.1, which are similar in this respect.

Asymmetry of these two methods

We now have all the elements we need to look back at the claim about the asymmetry of these derivation methods. This lack of symmetry is not a defect as such, but studying it is a good exercise to understanding the meaning of these two derivation methods. Let’s examine the derivation by extension of basePerson into the character element:

<xs:complexType name="basePerson">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:extension base="basePerson">
        <xs:sequence>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

The content model of character contains a mandatory qualification element. Valid characters are not valid per basePerson; thus, there is no hope to be able to derive character back into basePerson by restriction, since all the instance structures that are valid per the derived type must be valid per the base type in a derivation by restriction.

Let’s look back at the derivation by restriction of the person base type into a character element:

<xs:complexType name="person">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
    <xs:element ref="qualification" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>
             
<xs:element name="character">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="person">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="qualification"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

Again, it is not possible to derive the complex type of character into person, since it means changing the number of minimum occurrences of qualification from 1 to 0 and adding an optional dead element between born and qualification. None of these operations are possible during a derivation by extension, which can only append new content after the content of the base type, and can’t update an existing particle (to change the number of occurrences) nor insert a new particle between two existing particles.

Mixed Content Models

Although W3CXML Schema permits mixed content models and describes them better than in XML DTDS, W3CXML Schema treats them as an add-on plugged on top of complex content models. The good news is that this allows control of children elements exactly as we’ve just seen for complex contents. The bad news is that we abandon any control over the child text nodes whose values cannot be constrained at all, and, of course, the descriptions of the child elements are subject to the same limitations as in the case of complex content models. The limitations on unordered content models are probably even more unfriendly for mixed content models, which are more “free style,” than the limitation is for complex content models.

Creating Mixed Content Models

This add-on is implemented through a mixed attribute in the xs:complexType(global definition) , which is otherwise used exactly as we’ve seen for complex content models. The effect of this attribute when its value is set to "true" is to allow any text nodes within the content model, before, between, and after the child elements. The location, the whitespace processing, and the datatype of these text nodes cannot be restricted in any way.

Let’s go back to the definition of our title element and change it to accept a reduced version of XHTML with the a link and an em element to highlight some parts of its text. The definition, which was previously done by extending a simple type to create a simple content complex type, needs to be re-written as a complex content definition with a mixed attribute set to "true". The full definition, including the definition of the a element, the definition of a markedText complex type and its usage to define the title element, could be:

<xs:element name="a">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute name="href" type="xs:anyURI"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>
          
<xs:complexType name="markedText" mixed="true">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element name="em" type="xs:token"/>
    <xs:element ref="a"/>
  </xs:choice>
  <xs:attribute ref="lang"/>
</xs:complexType>
          
<xs:element name="title" type="markedText"/>

This definition matches elements such as:

<title lang="en">
  Being a
  <a href="http://dmoz.org/Shopping/Pets/Dogs/">
    Dog
  </a>
  Is a
  <em>
    Full-Time
  </em>
  Job
</title>

Note that the length of the title can no longer be restricted.

Derivation of Mixed Content Models

Mixed content models are derived exactly like the complex content models on which they have been plugged. The semantic of both methods stays exactly the same.

Derivation by extension

Mixed contents complex types can be derived by extension from other complex content complex types and the meaning will be the same. If I want to add a strong element to my markedText mixed content type, I can define the following content model:

<xs:element name="title">
  <xs:complexType mixed="true">
    <xs:complexContent mixed="true">
      <xs:extension base="markedText">
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element name="strong" type="xs:string"/>
        </xs:choice>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

One must note, though, that this extension is equivalent to:

<xs:complexType name="resultingType" mixed="true">
  <xs:sequence>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element name="em" type="xs:token"/>
      <xs:element ref="a"/>
    </xs:choice>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element name="strong" type="xs:string"/>
    </xs:choice>
  </xs:sequence>
  <xs:attribute ref="lang"/>
</xs:complexType>

This is probably not what we would like to see in practice since this content model expects to see all the occurrences of a and em before any instance of strong. We will see later, in Chapter 12, that this specific issue can be solved using a feature named “substitution groups” instead of using xs:choice.

Derivation by restriction

The derivation of mixed content models by restriction is also done using the method defined for complex content models, with the same constraint that each particle must be an explicit derivation of the corresponding particle of the base type. To illustrate the consequences of this constraint, let’s look again at the definition and the use of our markedText:

<xs:element name="a">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute name="href" type="xs:anyURI"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>
             
<xs:complexType name="markedText" mixed="true">
  <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element name="em" type="xs:token"/>
    <xs:element ref="a"/>
  </xs:choice>
  <xs:attribute ref="lang"/>
</xs:complexType>
             
<xs:element name="title" type="markedText"/>

If we want to forbid em elements in our title, force the href to be an http absolute URI, and require the lang attribute to be either en or es, we need to do some refactoring to show that the a element included in our title is an explicit derivation of the general definition of a. We also need to use a global complex type definition for a instead of the previous anonymous definition:

<xs:element name="a" type="link"/>

We can now either derive a new global complex type from the new link complex type or embed its derivation in the definition of our title element:

<xs:element name="title">
  <xs:complexType mixed="true">
    <xs:complexContent mixed="true">
      <xs:restriction base="markedText">
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element name="a">
            <xs:complexType>
              <xs:simpleContent>
               <xs:restriction base="link">
               <xs:attribute name="href">
               <xs:simpleType>
               <xs:restriction base="xs:anyURI">
               <xs:pattern value="http://.*"/>
               </xs:restriction>
               </xs:simpleType>
               </xs:attribute>
               </xs:restriction>
              </xs:simpleContent>
            </xs:complexType>
          </xs:element>
        </xs:choice>
        <xs:attribute name="lang">
          <xs:simpleType>
            <xs:restriction base="xs:language">
              <xs:enumeration value="en"/>
              <xs:enumeration value="es"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

This example is a caricature. In practice it would be more readable to create an intermediate global type definition to avoid embedding several derivations, but it provides an overview of this derivation process.

Derivation between complex and mixed content models

Since complex and mixed content models are built using the same mechanism, one may wonder what the possibilities are for deriving complex contents from mixed contents and vice versa. The answer to this question lurks in the semantic of these two derivation methods.

Derivation by extension appends new content after the content of the base type and the structure of the base type is kept unchanged. It is therefore not possible to derive a mixed content model from complex content model. When a content model is mixed, the position of the text nodes cannot be constrained, and this permits text nodes within the base type at any location. For the same reason, it is impossible to extend a mixed content model into a complex content model because the text nodes that are allowed in the base type would become forbidden.

Derivation by restriction defines a subset of the base type. It is forbidden to derive a mixed content model from a complex content model. The resulting type would allow text nodes that are forbidden in the base type and would expand rather than restrict the content model. There is one workable possibility, however. The last combination is the only possible one: a mixed content model can be restricted into a complex content model. Forbidding the text nodes of a mixed content model is a valid restriction and can be done by setting the mixed attribute to “false” in the xs:complexType definition. It is even possible to derive a simple content model into a mixed content model since this is, in fact, a restriction removing the sibling elements and keeping the text nodes. This assumes, of course, that the sibling elements are optional; i.e., they have a minOccurs attribute equal to 0.

Empty Content Models

Empty content models are elements that can only accept attributes. W3C XML Schema does not include any special support for empty content models, which can be considered either complex content models without elements or simple content models with a value restricted to the null string.

Creation of Empty Content Models

W3C XML Schema considers empty content models to be the intersection between complex content models (in the case in which no compositors are specified) and simple content models (in the case in which no text nodes are expected, which W3C XML Schema handles as if an empty text node was found). We will, therefore, be able to choose between the two methods to create an empty content model. Where we extended our title element to become mixed content, we carefully avoided adding empty elements, such as the HTML img or br. Let’s see how we could define a br element with its id and class attributes using both methods.

As simple content models

This is done by defining a simple type that can only accept the empty string as a value. Strictly speaking, empty content models do not accept any whitespace between their start and end tags. Since we want to control this, we must use a datatype that does not alter the whitespaces, i.e., xs:string . Our empty content model is then derived by extension from this simple type:

<xs:simpleType name="empty">
  <xs:restriction base="xs:string">
    <xs:enumeration value=""/>
  </xs:restriction>
</xs:simpleType>
             
<xs:element name="br">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="empty">
        <xs:attribute name="id" type="xs:ID"/>
        <xs:attribute name="class" type="xs:NMTOKEN"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

As complex content models

The other (more straightforward) way to do this is to create a complex content model without any subelements:

<xs:element name="br">
  <xs:complexType>
    <xs:attribute name="id" type="xs:ID"/>
    <xs:attribute name="class" type="xs:NMTOKEN"/>
  </xs:complexType>
</xs:element>

Derivation of Empty Content Models

Each of the two empty content types keeps the derivation methods of its content model (simple or complex). The main difference between these two methods is essentially a matter of which derivations may be applied on the base type and what effect it will have.

Derivation by extension

If we try to remember and compare what we’ve learned about deriving complex and simple contents by extension, we can see that both allow addition of new attributes to the complex type. However, while we can add new subelements to complex content, we cannot change the type of the text node for a simple content model. Thus, this is the first difference between the two methods: when the empty content model is built on a simple type, it will not be possible to add anything other than attributes, while if it is built on top of a complex type, it will be possible to extend it to accept elements.

Derivation by restriction

At first glance, it seems that there are fewer differences here. The restriction methods of both simple and complex contents allow the restriction the scope of the attributes; restricting the content, which is already empty, doesn’t seem to be very interesting. It’s time, though, to remember what we’ve learned about a simple type derivation facet, which actually extends the set of valid instance documents! The “empty” simple type that we created to derive our empty simple content model has a base type equal to xs:string . When this simple type is derived through xs:whiteSpace , the result may be an expansion of the sets of valid instance structures. In our case, setting xs:whiteSpace to “collapse” has the effect of accepting any sequence of whitespaces between the start and closing tags. This new type is not “empty,” strictly speaking, but may be useful for some (if not for most) applications that are normalizing the whitespaces and do not make any difference between these two cases. Such a derivation can be done on the simple content complex type like this:

<xs:simpleType name="empty">
  <xs:restriction base="xs:string">
    <xs:enumeration value=""/>
  </xs:restriction>
</xs:simpleType>
             
<xs:complexType name="emptyBr">
  <xs:simpleContent>
    <xs:extension base="empty">
      <xs:attribute name="id" type="xs:ID"/>
      <xs:attribute name="class" type="xs:NMTOKEN"/>
    </xs:extension>
  </xs:simpleContent>
</xs:complexType>
             
<xs:complexType name="allmostEmptyBr">
  <xs:simpleContent>
    <xs:restriction base="emptyBr">
      <xs:whiteSpace value="collapse"/>
      <xs:attribute name="id" type="xs:ID"/>
      <xs:attribute name="class" type="xs:NMTOKEN"/>
    </xs:restriction>
  </xs:simpleContent>
</xs:complexType>

Simple or Complex Content Models for Empty Content Models?

As we have seen, choosing a simple or complex type doesn’t make an awful lot of difference, except for extensibility. If we want to keep the possibility of adding subelements by derivation in the content model, we’d better choose an empty complex content model. However, if we want to be able to accept whitespaces in a derived type, an empty simple content model is a better bet.

Back to Our Library

We’ve covered so much ground in this chapter that it’s not obvious which features could be the most beneficial! This choice also depends on external factors such as the level of W3C XML Schema support available from the tools that will be used. For instance, some tools that produce Java classes or binding may take advantage of complex type derivation by restriction. This is the path we will follow for now. We will create a complex type complex content, which will be a superset of the content models of author and character, which we will derive by restriction. First, we can also define an empty content model with an id attribute, which can be derived by extension for all the content models that have an id attribute:

<xs:complexType name="elementWithID">
  <xs:attribute ref="id"/>
</xs:complexType>

Note that we cannot use this type directly to define the book element, since its id attribute is a restriction of xs:ID :

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="isbn"/>
      <xs:element ref="title"/>
      <xs:element ref="author" minOccurs="0" maxOccurs="unbounded"/> 
      <xs:element ref="character" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="id" type="bookID"/>
    <xs:attribute ref="available"/>
  </xs:complexType>
</xs:element>

To use our elementWithID complex type to define the book element, we need to derive by extension a complex type corresponding to the complex type of book without the restriction on the id attribute. The following code is quite verbose, but it is shown here as an exercise:

<xs:complexType name="bookTmp">
  <xs:complexContent>
    <xs:extension base="elementWithID">
      <xs:sequence>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/> 
        <xs:element ref="author" minOccurs="0"
          maxOccurs="unbounded"/> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute ref="available"/>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>
       
<xs:element name="book">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="bookTmp">
        <xs:sequence>
          <xs:element ref="isbn"/>
          <xs:element ref="title"/> 
          <xs:element ref="author" minOccurs="0"
            maxOccurs="unbounded"/> 
          <xs:element ref="character" minOccurs="0"
            maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="id" type="bookID"/>
        <xs:attribute ref="available"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

A more concise option is to derive by restriction first:

<xs:complexType name="elementWithBookID">
  <xs:complexContent>
    <xs:restriction base="elementWithID">
      <xs:attribute name="id" type="bookID"/>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>


<xs:complexType name="book">
  <xs:complexContent>
    <xs:extension base="elementWithBookID">
      <xs:sequence>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/> 
        <xs:element ref="author" minOccurs="0"
          maxOccurs="unbounded"/> 
        <xs:element ref="character" minOccurs="0"
          maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute ref="available"/>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

Using the elementWithID to derive by extension a personType, which can then be used to derive the author and character elements by restriction, is straightforward, if not concise. We have already seen this example. The full schema is then:

<?xml version="1.0"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:simpleType name="string255">
    <xs:restriction base="xs:token">
      <xs:maxLength value="255"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="string32">
    <xs:restriction base="xs:token">
      <xs:maxLength value="32"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="isbn">
    <xs:restriction base="xs:NMTOKEN">
      <xs:totalDigits value="10"/>
      <xs:pattern value="[0-9]{9}[0-9X]"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="bookID">
    <xs:restriction base="xs:ID">
      <xs:pattern value="b[0-9]{9}[0-9X]"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="supportedLanguages">
    <xs:restriction base="xs:language">
      <xs:enumeration value="en"/>
      <xs:enumeration value="es"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="date">
    <xs:restriction base="xs:date">
      <xs:pattern value="[^:Z]*"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:element name="name" type="string32"/>
  <xs:element name="qualification" type="string255"/>
  <xs:element name="born" type="date"/>
  <xs:element name="dead" type="date"/>
  <xs:element name="isbn" type="isbn"/>
  <xs:attribute name="id" type="xs:ID"/>
  <xs:attribute name="available" type="xs:boolean"/>
  <xs:attribute name="lang" type="supportedLanguages"/>
  <xs:complexType name="elementWithID">
    <xs:attribute ref="id"/>
  </xs:complexType>
  <xs:complexType name="bookTmp">
    <xs:complexContent>
      <xs:extension base="elementWithID">
        <xs:sequence>
          <xs:element ref="isbn"/>
          <xs:element ref="title"/> 
          <xs:element ref="author" minOccurs="0"
            maxOccurs="unbounded"/> 
          <xs:element ref="character" minOccurs="0"
            maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute ref="available"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="personType">
    <xs:complexContent>
      <xs:extension base="elementWithID">
        <xs:sequence>
          <xs:element ref="name"/>
          <xs:element ref="born"/>
          <xs:element ref="dead" minOccurs="0"/>
          <xs:element ref="qualification" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base="string255">
          <xs:attribute ref="lang"/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="book" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="book">
    <xs:complexType>
      <xs:complexContent>
        <xs:restriction base="bookTmp">
          <xs:sequence>
            <xs:element ref="isbn"/>
            <xs:element ref="title"/> 
            <xs:element ref="author" minOccurs="0"
              maxOccurs="unbounded"/> 
            <xs:element ref="character" minOccurs="0"
              maxOccurs="unbounded"/>
          </xs:sequence>
          <xs:attribute name="id" type="bookID"/>
          <xs:attribute ref="available"/>
        </xs:restriction>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="author">
    <xs:complexType>
      <xs:complexContent>
        <xs:restriction base="personType">
          <xs:sequence>
            <xs:element ref="name"/>
            <xs:element ref="born"/>
            <xs:element ref="dead" minOccurs="0"/>
          </xs:sequence>
          <xs:attribute ref="id"/>
        </xs:restriction>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>
  <xs:element name="character">
    <xs:complexType>
      <xs:complexContent>
        <xs:restriction base="personType">
          <xs:sequence>
            <xs:element ref="name"/>
            <xs:element ref="born"/>
            <xs:element ref="qualification"/>
          </xs:sequence>
          <xs:attribute ref="id"/>
        </xs:restriction>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>
</xs:schema>

Derivation or Groups

Since the derivation methods for complex types do not widen the scope of structures that can be defined by W3C XML Schema and are rather complex, their usage is controversial. Kohsuke Kawaguchi has published a convincing article on XML.com (http://www.xml.com/pub/a/2001/06/06/schemasimple.html) that explains how to avoid using complex type derivations without losing much in modularity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.51.241