Chapter 3. Giving Some Depth to Our First Schema

Our first schema was very flat, and all its components were defined at the top level. Our second attempt will give it more depth and show how local components may be defined.

Working From the Structure of the Instance Document

For this second schema, we follow a style opposite from the one we used in Chapter 2, and we define all the elements and attributes locally where they appear in the document.

Following the document structure, we will start by defining our document element library. This element was defined in the earlier schema as:

<xs:element name="library">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="book" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

In our new schema, we will keep the same construct and the same structure, but we will replace the reference to the book element with the actual definition of this element:

<xs:element name="library">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="book" maxOccurs="unbounded">
        <xs:complexType>
          <xs:sequence>
            <xs:element ref="isbn"/>
            <xs:element ref="title"/> 
            <xs:element ref="author" minOccurs="0"
              maxOccurs="unbounded"/> 
            <xs:element ref="character" minOccurs="0"
              maxOccurs="unbounded"/>
          </xs:sequence>
          <xs:attribute ref="id"/>
          <xs:attribute ref="available"/>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Because the definition of the book element is contained inside the definition of the library element, other definitions of book elements could be done at other locations in the schema without any risk of confusion—except maybe by human readers.

If all the elements and attributes still referenced in this schema are defined as global, this piece of schema is valid and accurately describes our schema. The only differences between the first schema and this intermediary step are that the definition of the book element cannot be reused elsewhere, and the book element can no longer be a document element any longer.

We can also reiterate the same operation and perform the definitions of all the elements and all the attributes locally:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="isbn" type="xs:integer"/>
              <xs:element name="title">
                <xs:complexType>
                 <xs:simpleContent>
                 <xs:extension base="xs:string">
                 <xs:attribute name="lang" type="xs:language"/>
                 </xs:extension>
                 </xs:simpleContent>
                </xs:complexType>
              </xs:element> 
              <xs:element name="author" minOccurs="0"
                maxOccurs="unbounded">
                <xs:complexType>
                 <xs:sequence>
                 <xs:element name="name" type="xs:string"/>
                 <xs:element name="born" type="xs:date"/>
                 <xs:element name="dead" type="xs:date"/>
                 </xs:sequence>
                 <xs:attribute name="id" type="xs:ID"/>
                </xs:complexType>
              </xs:element> 
              <xs:element name="character" minOccurs="0"
                maxOccurs="unbounded">
                <xs:complexType>
                 <xs:sequence>
                 <xs:element name="name" type="xs:string"/>
                 <xs:element name="born" type="xs:date"/>
                 <xs:element name="qualification" type="xs:string"/>
                 </xs:sequence>
                 <xs:attribute name="id" type="xs:ID"/>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
            <xs:attribute name="id" type="xs:ID"/>
            <xs:attribute name="available" type="xs:boolean"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Apart from an obvious difference in style, this new schema is validating the same instance document as in Chapter 2. It is not, strictly speaking, equivalent to the first one: it is less reusable (the document element is the only one that could be reused in another schema) and more strict, since it validates only the documents that have a library document element. Chapter 2’s schema must validate documents having any of the elements as a document element.

Tip

The price we pay to constrain the value of the document root element with W3C XML Schema is a loss of reusability. This has been widely criticized without affecting the decision of its editors. We will see, fortunately, that there are some workarounds to limit this loss for applications that need to constrain the value of the document element.

New Lessons

Although this schema describes the same document as the one in Chapter 2, it illustrates very different aspects of W3C XML Schema.

Depth Versus Modularity?

Even though we will present features to balance this fact in the next chapters— xs:complexType and xs:group—we have sacrificed the modularity of our first schema to gain the depth and structure of the second one. This is a general tendency in W3C XML Schema.

In practice, you will probably want to keep a balance between these two opposite styles and allow a certain level of depth under several global elements.

There are two cases, however, in which these two styles are not equivalent. The first is when elements with the same name need to be defined with different contents at different locations. In this case, local element definitions should be used (at least at all the location except one) since the elements are identified by their names.

In our example, the element name appears both within author and character with the same datatype. We may want to define the element name with different content models in author and character, as in this instance document:

<?xml version="1.0"?>
<library>
  <book id="b0836217462" available="true">
    <isbn>
      0836217462
    </isbn>
    <title lang="en">
      Being a Dog Is a Full-Time Job
    </title>
    <author id="CMS">
      <name>
        <first>
          Charles
        </first>
        <middle>
          M.
        </middle>
        <last>
          Schulz
        </last>
      </name>
      <born>
        1922-11-26
      </born>
      <dead>
        2000-02-12
      </dead>
    </author>
    <character id="Snoopy">
      <name>
        Snoopy
      </name>
      <born>
        1950-10-04
      </born>
      <qualification>
        extroverted beagle
      </qualification>
    </character>
  </book>
</library>

Since we can define only one global element named name, we need to define at least one of the name elements locally under its parent.

The W3C Schema for XML Schema gives several examples of elements having different types depending on their location. We will see this used in the next section in our Russian doll schema: global definitions of elements have a different type in the schema for schema than local definitions or references, even though they use the same element name (xs:element).

Tip

Whether defining elements with the same name and different datatypes is good practice or not is subject to discussion. It may be confusing for human authors and more difficult to document, but W3C XML Schema gives, through local definitions, a way to avoid any confusion for the applications that will process these documents. In our example, for instance, we have two occurrences of a name element under author and under character. It is perfectly possible to define different constraints and even contents on those two elements. Although this could be presented as overloaded element names (“character/name” versus “author/name”), I find this practice unreliable, since we often don’t have a clear and simple way to identify those two contexts.

Another example is recursive schema, in which an element can be included within an element of the same type directly or indirectly in a child element. In this case, a flat design employing references must be used since the depth of these recursive structures is unlimited.

W3C XML Schema offers several examples of such elements with local definitions of elements that can be recursively nested, as is the case in our second schema. A flat design must be used since these elements need to be referenced if we don’t want to limit the maximum depth of the structure, and the schema for schema uses a reference mechanism. (The actual mechanism used in this case involves an element group, a feature we have not seen yet but is equivalent to an actual reference to an element.)

Russian Doll and Object-Oriented Design

The style of defining elements and attributes locally is often called the Russian doll design, since the definition of each element is embedded in the definition of its parent, in the same way Russian dolls are embedded into each other.

If we look at the Russian dolls with our object-oriented lenses, we may say that the objects are now created locally where they are needed as opposed to being created globally and cloned when we need them (which was the case as in our first schema).

At this point, we still need to learn how we can create types that are the equivalent of classes of objects and containers, and that will let us manipulate sets of objects.

Where Have the Element Types Gone?

Those of you who are familiar with XML (or SGML) and its DTD are used to identifying the elements though the term “element type.” The XML 1.0 Recommendation states that “each element has a type, identified by name.” This is further disambiguated by the namespaces specification, which explain that “an XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names.”

A surprising feature of our Russian doll schema is that this fundamental notion of element type has completely disappeared, and there is no way to tell which element type name is. Two different elements have been defined as having a name equal to name. These have an independent definition, which is identical in our example, but could be different—such as if we had decomposed the first, middle, and last names for authors, but not for characters. The notion of element type name doesn’t mean anything if we do not specify in which context it is used.

This loss has such little importance that few people have even noticed it. There are some situations where we need to identify elements, though—for instance to document XML vocabularies. A convenient way to write a reference manual for a XML vocabulary is to write an index of the element names with their definition. This becomes much more complex when there is no clear match between element types and their definitions and content models.

Tip

RDF is another application that relies on element types. RDF uses element types to identify elements as objects in its triples. The element “name” of the namespace http://dyomedea.com/ns is identified as http://dyomedea.com/ns#name. Cutting the link between element types and their schema definition makes it difficult, if not impossible, to answer basic questions, such as what’s the content model of http://dyomedea.com/ns#name, and where can I find its definition.

I was confronted with this issue when writing the reference guide of this book since the W3C XML Schema for W3C XML Schema uses many local element definitions. I came to the conclusion that the fact that the same element type (such as xs:restriction, which we will see later on) can have different content models with a different semantic, depending on its location in a schema, adds a significant amount of difficulty in understanding the language and reading a schema.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.75.227