Chapter 22. XML Schema (XSDL) Tad Tougher Tutorial

  • XML Schema definition language

  • Syntax and declarations

  • Simple and complex types

  • Locally-scoped elements

  • Schema inclusion

An XML DTD is a specific case of a more general concept called a schema definition. The dictionary defines schema as a “general conception of what is common to all members of a class.” A schema definition takes that “conception” and turns it into something concrete that can be used directly by a computer.

There are many types of schema in use in the computer industry, chiefly for databases. DTDs are different in that the class for which they declare “what is common to all members” is a class of XML documents.

The popularity of XML has brought DTDs to entirely new constituencies. The database experts and programmers who are taking to XML in droves are examining it from the standpoint of their own areas of expertise and familiar paradigms.

All of these creative folks have ideas about what could be done differently. The World Wide Web Consortium has incorporated these ideas into a design for an enhanced schema definition facility called the XML Schema definition language (XSDL).

The name of the language is often shortened to “XML Schema”, but we reserve that phrase for the name of the W3C spec. We call the language XSDL so you’ll know when we are referring to the “schema definition language” as opposed to:

  • a particular “schema definition”,

  • a conceptual “schema”, or

  • the XML Schema specification.

The words “XML schema”, unfortunately, could refer to any of those three things.

Caution

Caution

The XML Schema spec is several times longer than the XML specification itself. It is also quite intricate and formal. This chapter will informally teach a subset that is sufficiently functional for most projects and yet simple enough that we can teach it all in one chapter.

A simple sample schema

Let’s start our explanation of XSDL by introducing a sample schema definition.

The sample XSDL definition in Example 22-2 demonstrates some of the most important features of the language. Example 22-1 shows a document that conforms to that schema.

Example 22-1. Poem document

<?xml version="1.0"?>
<poem xmlns="http://www.poetry.net/poetns"
    publisher="Boni and Liveright" pubyear="1922">
<title>The Waste Land</title>
<picture href="pic1.gif"/>
<verse>April is the cruellest month, breeding</verse>
<verse>Lilacs out of the dead land</verse>
</poem>

The schema defines a poem element type that consists of a title element followed by a picture and one or more verse elements. The poem element type has two optional attributes: publisher and pubyear.

The picture element type’s required href attribute is declared with a type of xsd:anyURI, which indicates that the value of href must have a URI datatype.

There are several notable characteristics of the syntax:

  • Perhaps the most obvious is that a schema definition is represented as an XML document.

  • There is a dependency on namespaces, which are heavily utilized.[1] The schema element declares the prefix xsd for names defined in XML Schema and poem for names that are defined within this schema definition.

  • There is a built-in syntax that can be used to declare datatypes: the type attribute. Actually, it is used for more than datatypes; it is arguably the most important concept in XSDL, as we’ll soon see.

Elements and types

Notice the four element elements[2] in Example 22-2: poem, title, verse and picture. They declare the properties of a class of elements:

  1. the element-type name;

  2. the data structure of the content; and

  3. attributes provided for the class, including attribute types and default values.

Example 22-2. Poem schema definition in XSDL

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:poem="http://www.poetry.net/poetns"
   targetNamespace="http://www.poetry.net/poetns">
  <xsd:element name="poem">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="poem:title"/>
        <xsd:element ref="poem:picture"/>
        <xsd:element ref="poem:verse" maxOccurs="unbounded"/>
      </xsd:sequence>
      <xsd:attribute name="publisher" type="xsd:string"/>
      <xsd:attribute name="pubyear" type="xsd:NMTOKEN"/>
    </xsd:complexType>
   </xsd:element>
  <xsd:element name="title" type="xsd:string"/>
  <xsd:element name="verse" type="xsd:string"/>
  <xsd:element name="picture">
    <xsd:complexType>
      <xsd:attribute name="href" use="required" type="xsd:anyURI"/>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

In XSDL, a complexType element can define and name a data structure and attributes independently of declaring an element type. There is also a simpleType element, but it can define and name only the simplest data structures:various forms of character strings, such as datatypes.

The unqualified word type in XSDL is reserved for just these two types: complex types and simple types.[3]

An element element, then, declares the element-type name and data structure type of a class of elements. The type could be defined within the content of the element element, as in the case of poem in Example 22-2, in which case the type itself is not named.

Alternatively, if the type was defined and named elsewhere, the declaration could use a type attribute, as shown for the title element. As the xsd prefix in the attribute value suggests, the string type is not actually defined within this schema. XSDL automatically provides named simple type definitions for all of the built-in datatypes. Those names are in the XML Schema namespace.

The two methods of declaring data structure types for elements are equivalent, and are equally applicable to complex and simple types. That includes user-derived datatypes, which, as we saw in 21.2, “Defining user-derived datatypes”, on page 451, are actually simple types.

Structure of a schema definition

A schema is defined by one or more schema documents. Their root element is a schema element. Its attributes can define applicable namespaces, and its content components include elements like those we have been discussing, plus annotation elements.

Namespaces

The schema element must have a declaration for the XML schema namespace, http://www.w3.org/2001/XMLSchema. It could either assign a prefix (xsd and xs are two popular ones) or it could make XML Schema the default namespace. The prefix (if any) is used both for schema component elements and in references to built-in datatypes.

To validate documents that use namespaces, you can specify a targetNamespace for the schema, as shown in Example 22-2. Components that are children of the schema element are called global schema components. They declare and define items in the schema’s target namespace.

The example also declares the poem prefix for the schema’s target namespace. It is used within the schema definition to refer to the elements, attributes and types declared (or defined) by global schema components.

The instance document in Example 22-1 utilizes the same namespace, http://www.poetry.net/poetns. However, as it is declared as the default namespace, no prefix is declared or used.

Note that the value of the name attribute of a component, such as a type definition or element declaration, does not have a namespace prefix. A component name always belongs to the target namespace. It is only when the declared or defined objects are referenced that the prefix may be used.

Within a namespace different kinds of components can normally have the same name. The exception is simple and complex types, as there are many places where they are treated interchangeably. Elements, however, are not types, so an element declaration component may use the same name as a complex or simple type. There is no more relationship between them than between a guy named Bob at your office and the guy named Bob on your favorite television show (unless you work in Hollywood!).

Schema components

The XSDL elements we have been discussing, such as element and simpleType, occur in the content of a schema element and are collectively known as schema components. Those, like element, that correspond to DTD declarations, are also called (surprise!) declaration components.

As XSDL schemas are themselves defined in XML documents, it was possible to provide techniques to make them self-documenting and capable of being processed by applications other than schema processors. These include unique identifiers, extension attributes, and annotation elements.

Unique identifiers

All schema components are defined with an optional id attribute. You can therefore assign unique identifiers to make the components easier to refer to using XPath expressions. Each value assigned to an id attribute must be different from any other assigned anywhere in the schema document.

Extension attributes

Schema components may be extended with arbitrary attributes in any namespace other than the XML Schema namespace. For instance you could add attributes from the XLink namespace or from the RDF namespace.

If you had software that helped you to visualize the schema, extension attributes could be used to store the graphical coordinates of the various elements. If you used software that converted XML schemas to a relational database schema, you might use the extension attributes to guide that process.

The annotation element

Any XSDL component may have an annotation element as its first sub-element. The schema element, however, goes above and beyond the call of duty! It may have as many annotation sub-elements as you like. It is good practice to have at least one annotation at the beginning as an introduction to the document type.

An annotation element may have zero or more documentation and appinfo children elements.

The documentation element is used to add user-readable information to the schema. Any elements are permitted; they needn’t be defined in the schema. The benefit of using annotation elements rather than XML comments is that it is much easier to use rich markup such as XHTML or Dockbook. Application software can extract this documentation and use it for online help or other purposes.

An appinfo element adds some information specific to a particular application. These are extension elements; they work like the extension attributes we discussed earlier. You may use them for the same sorts of tasks, but the elements can have an internal structure while attributes can only contain data characters. Your extension elements should be in a namespace that will enable your applications to recognize them.

Complex types

Example 22-3 shows the definition of an address type. It also shows two element declarations that utilize it.

Example 22-3. Elements built on an address type

<xsd:complexType name="address">
   <xsd:sequence>
     <xsd:element ref="myns:line1"/>
     <xsd:element ref="myns:line2"/>
     <xsd:element ref="myns:city"/>
     <xsd:element ref="myns:state"/>
     <xsd:element ref="myns:zip"/>
   </xsd:sequence>
   <xsd:attribute name="id" type="xsd:ID"/>
</xsd:complexType>
<xsd:element name="billingAddress" type="myns:address"/>
<xsd:element name="shippingAddress" type="myns:address"/>

In XSDL, types are definable independently of elements and may be associated with more than one element-type name. In the example, the address type is used by both billingAddress and shippingAddress.

This example shows some of the power of complex types: we can create structural definitions as reusable units that make element declaration and maintenance easier. Types are similar to the virtual or abstract classes used in object-oriented programming.

Types do not themselves define elements that will be used directly. Example 22-3 would not permit an address element in a valid document. Instead the type is a set of reusable constraints that can be used as a building block in element declarations and other type definitions.

XSDL does not require you to give every type a name. If you only intend to use a type once, you could put the definition for it right in an element declaration, as in Example 22-4.

Example 22-4. Inline type definition

<xsd:element name="address">
  <xsd:complexType>
   <xsd:sequence>
     <xsd:element ref="myns:line1"/>
     <xsd:element ref="myns:line2"/>
     <xsd:element ref="myns:city"/>
     <xsd:element ref="myns:state"/>
     <xsd:element ref="myns:zip"/>
   </xsd:sequence>
   <xsd:attribute name="id" type="xsd:ID"/>
  </xsd:complexType>
</xsd:element>

Example 22-4 was created from Example 22-3 by wrapping the complexType in an element and moving the name attribute. You could do the same with a simpleType. Note that the element declaration has no type attribute. You need to choose whether to refer to a named type or embed an unnamed type definition.

To create a type that allows character data in addition to whatever is specified in its content model, you may add a mixed="true" attribute value to the complexType element.

To declare the element-type empty, we could have left out the sequence element.

Content models

Content models allow us to describe what content is allowed within an element.

Sequences

A sequence is specified in Example 22-5. It indicates that there must be an A element followed by a B element followed by a C element.

Example 22-5. sequence element

<xsd:sequence>
   <xsd:element ref="myns:A"/>
   <xsd:element ref="myns:B"/>
   <xsd:element ref="myns:C"/>
</xsd:sequence>

An element element might declare things directly, or else indirectly by referencing an existing element declaration. The declarations in the example do the latter, as indicated by the use of ref attributes instead of name attributes. Note that an element reference must be prefixed if it lives in a namespace (which it will if declared in a schema with a targetNamespace).

Choices

Example 22-6 shows the XSDL code that defines a choice of element types. It means the element must contain either an A or a B or a C.

Example 22-6. choice element

<xsd:choice>
   <xsd:element ref="myns:A"/>
   <xsd:element ref="myns:B"/>
   <xsd:element ref="myns:C"/>
</xsd:choice>

Nested model groups

For more complex content models, model groups can be nested. For example, we can specify a choice element within a sequence element, as shown in Example 22-7.

Example 22-7. Sequence with nested choice

<xsd:sequence>
  <xsd:element ref="poem:title"/>
  <xsd:element ref="poem:picture"/>
  <xsd:element ref="poem:verse" maxOccurs="unbounded"/>
  <xsd:choice>
    <xsd:element ref="poem:footnotes"/>
    <xsd:element ref="poem:bibliography"/>
  </xsd:choice>
</xsd:sequence>

The declaration for verse states that it may have multiple occurrences through its maxOccurs attribute. There is a corresponding minOccurs that defaults to “1” – meaning at least one is required by default.

Inside of sequences and choices it is also possible to use any and group elements. An any element means that any content is allowed. It has various bells and whistles to allow you to narrow down what you mean by “any”. Most document types do not require this feature so we will not go into any detail.

The group element allows you to refer to a named “model group definition”. You can use these model group definitions to reuse parts of content models by referencing them.

all elements

The all element specifies that all of the contained elements must be present, but their order is irrelevant. So you could enter “A B C”, “A C B”, “B A C” and all of the other combinations of the three. Example 22-8 demonstrates.

Example 22-8. all element

<xsd:complexType name="testAll">
  <xsd:all>
    <xsd:element ref="myns:A"/>
    <xsd:element ref="myns:B"/>
    <xsd:element ref="myns:C"/>
  </xsd:all>
</xsd:complexType>

all must only be used at the top level of a complex type definition. all is also unique in that it may only contain element elements, not sequences, choices, groups, etc.

Attributes

The poem and picture element declarations in Example 22-2 both contain attribute declarations. Example 22-9 shows the declarations for the poem element’s optional publisher and pubyear attributes.

Example 22-9. Attribute declarations

<xsd:attribute name="publisher" type="xsd:string"/>
<xsd:attribute name="pubyear" type="xsd:NMTOKEN"/>

They are optional because there is no use attribute in their definitions. You can also make them required with use="required".

Example 22-10 shows two attribute declarations. One uses a built-in datatype and the other a user-defined simple type.

Example 22-10. Built-in and user-defined types

<xsd:attribute name="href" use="required" type="xsd:anyURI"/>
<xsd:attribute name="pubdate" type="myns:pubyear"/>

Attribute declarations can also occur within a named attributeGroup element, which allows them to be reused in complex type definitions and in other attribute groups.

attribute elements have a default attribute that allows you to specify a default value for optional attributes. To supply a default value that cannot be overridden, supply it using the fixed attribute rather than the default attribute.

Declaring schema conformance

How does an XML document tell a processor that it conforms to a particular XSDL schema definition? Usually it doesn’t!

In theory, you can determine which schema definition to use from the root element type, file type, or other cues. In practice, the namespace is typically used.

There is a convention specified in the XML Schema specification to allow the document author to give a more explicit hint to the receiver. Example 22-11 demonstrates.

Example 22-11. Referring to a schema definition

<myns:mydoc
    xmlns:myns="http://www.myns.com/myns"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.myns.com/myns
                        http://www.mysite.com/myxsdl.xsd">

</myns:mydoc>

Note the declaration of the xsi namespace prefix. It identifies a namespace that is specifically for putting XML Schema information into instance documents. There is a global attribute in this namespace called schemaLocation that allows a document to point to an appropriate schema definition.

The attribute value is defined as a list of paired URIs. The first one in each pair is a namespace URI. The second one is the URI for a schema document. As the schema processor works its way through the instance document, it can find the applicable schema for an element or attribute by looking up its namespace.

The sender of a document may also provide the receiver with a schema through an API, command line, or graphical interface.

Although there is nothing wrong with using these hints just to check whether a document is valid, often you want to check whether it validates against some particular schema. In that case you don’t want your software to use hints, you want it to use the schema you’ve provided.

The manner in which you tell the software what schema to use for a particular namespace will depend on the software. One convention is merely to configure the software with a list of schemas. The software can read the schemas and collect the list of target namespaces from the targetNamespace attributes. Then, when it sees a particular namespace in a document it can use the appropriate schema to validate it. Because you provide the list of schemas in the beginning, you know exactly what schemas are being used to validate no matter what is in the document.

Schema inclusion

The schema inclusion facility allows a schema definition to treat another schema definition’s contents as part of its own. Example 22-12 uses the include element to incorporate declarations from the schema in Example 22-13. The declarations are thenceforth treated as part of the book.xsd schema.

Example 22-12. book.xsd schema definition including declarations from common.xsd

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:myns="http://www.myns.net/myns"
   targetNamespace="http://www.myns.net/myns">
  <xsd:include schemaLocation="common.xsd"/>
  <xsd:element name="book">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="myns:title"/>
        <xsd:element ref="myns:chapter" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="chapter">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="myns:title"/>
        <xsd:element ref="myns:par" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Example 22-13. common.xsd schema definition

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:myns="http://www.myns.net/myns"
   targetNamespace="http://www.myns.net/myns">
  <xsd:element name="title" type="xsd:string"/>
  <xsd:element name="par" type="xsd:string"/>
</xsd:schema>

In addition to inclusion, XSDL also has support for importing and redefinition of other schemas. Importing is for combining schemas that describe different namespaces. A redefinition allows you to include another schema and override bits and pieces of the included schema. For instance you could redefine the type of an element or attribute that you are including.

Additional capabilities

We’ll now describe some additional functions briefly. We’ll try to provide just enough detail to allow you to decide whether to investigate them further.

Locally-scoped elements

Element-type names in a schema are normally global; any element type can be referenced in any other element type’s content model. So if you define title you can use it in chapters, sections and anywhere else you see fit.

Once you have defined your schema, authors can use the title element in each of the contexts that you have specified. In each of those contexts the element type is exactly the same: it has the same name, attributes and allowed content.

XSDL has a facility that allows you to say that titles in one context should have a different attribute set and content model from titles in another – even if they are in the same namespace! In effect, you can declare two element types with the same name. The name is bound to a different element-type definition in each context.

You can do this by declaring an element type within the declaration for another element type. Example 22-14 shows two different title element types declared within the same schema. They each use a different user-derived datatype.

Example 22-14. Two locally-scoped title declarations

<xsd:element name="book"><xsd:complexType>
  <xsd:sequence>
    <xsd:element name="title"><xsd:complexType>
      <xsd:attribute name="booktitle" use="required"
                     type="xsd:string"/>
      <xsd:attribute name="ISBN" use="required"
                     type="myns:ISBNFormat"/>
    </xsd:complexType></xsd:element>
    <xsd:element ref="myns:chapter"/>
  </xsd:sequence>
</xsd:complexType></xsd:element>

<xsd:element name="employee"><xsd:complexType>
  <xsd:sequence>
    <xsd:element name="empId"/>
    <xsd:element name="title"><xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="myns:jobtitle"/>
        <xsd:element ref="myns:company"/>
      </xsd:sequence>
    </xsd:complexType></xsd:element>
  </xsd:sequence>
</xsd:complexType></xsd:element>

In documents conforming to this schema, a title element within a book element must conform to the title element declared within the book element declaration, complete with the required ISBN and booktitle attribute values.

A title element within an employee element, however, must conform to the title element declared inside the example’s employee element declaration.

Type derivation

Type derivation is the creation of a new type as a variation of an existing one (or a combination of several existing ones). This is much like the way object-oriented classes inherit from other classes. The derived type will have a content model that is an extension of the base type’s.

We discussed derivation of simple types in 21.2, “Defining user-derived datatypes”, on page 451. In Example 22-15 we see the derivation of a complex type, internationalAddress. It adds a new child element, called countryCode, to the address type we defined in Example 22-3.

Example 22-15. One type extends another

<xsd:complexType name="internationalAddr">
  <xsd:complexContent>
    <xsd:extension base="myns:address">
      <xsd:sequence>
          <xsd:element ref="myns:countryCode"/>
      </xsd:sequence>
    </xsd:extension>
  </xsd:complexContent>
</xsd:complexType>

With this definition, an internationalAddr is just like an address, but after specifying the details of the address you must also specify a countryCode.

You can also derive a type by restriction. That means that you add constraints, such as making an attribute or subelement required when it was previously optional. This is very similar to the equivalent concept for datatypes.

Identity constraints

There could be several elements in a document that logically represent a set. For example, records of the employees in a company might be represented as elements in an XML document.

Each element in a set must have a unique name or key that distinguishes it from all other elements in the set. For example, the employee records key could be an empid attribute or sub-element.

These identity constraints can get even more complicated: we might wish to declare that there must be no two customers with the same first name, last name, and address.

XSDL has sophisticated features for defining unique keys and the means of referencing them. It uses XPath for this purpose. For instance you could define the list of customers with one XPath expression and use a second to describe how each member of the set is unique.[4]

Conclusion

XSDL is a sophisticated tool in the toolbox of schema developers. It has the virtue of supporting modern ideas of inheritance and namespaces. At the same time it is controversial because it is so large and complex. The subset described in this chapter should be both useful and manageable.[5]



[1] Namespaces are discussed in Chapter 16, “Namespaces”, on page 376.

[2] Yes, element elements.

[3] In fact the XSDL spec barely mentions any other types (element, attribute, data, etc.), perhaps to avoid confusion with the unqualified use of “type”.

[4] The discussion in 18.6.7, “Keys”, on page 410 gives an idea of the problem and the approach to solving it.

[5] For the whole story, we recommend Priscilla Walmsley’s Definitive XML Schema, published in this series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.36.166