Chapter 2. A quick tour of XML Schema

This chapter provides a quick tour of the main components of XML Schema. It also introduces a simple example of a schema and a conforming instance that will be used and built upon throughout the book.

2.1. An example schema

Suppose you have the instance shown in Example 2–1. It consists of a product element that has two children (number and size) and an attribute (effDate).

Example 2–1. Product instance


<product effDate="2001-04-12">
  <number>557</number>
  <size>10</size>
</product>


Example 2–2 shows a schema that might be used to validate our instance. Its three element declarations and one attribute declaration assign names and types to the components they declare.

Example 2–2. Product schema


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="product" type="ProductType"/>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
      <xs:element name="size" type="SizeType"/>
    </xs:sequence>
    <xs:attribute name="effDate" type="xs:date"/>
  </xs:complexType>
  <xs:simpleType name="SizeType">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="2"/>
      <xs:maxInclusive value="18"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>


2.2. The components of XML Schema

Schemas are made up of a number of components of different kinds, listed in Table 2–1. All of the components of XML Schema are discussed in detail in this book, in the chapters indicated in Table 2–1.

Table 2–1. XML Schema components

Image

2.2.1. Declarations vs. definitions

Schemas contain both declarations and definitions. The term declaration is used for components that can appear in the instance and be validated by name. This includes elements, attributes, and notations. The term definition is used for other components that are internal to the schema, such as complex and simple types, model groups, attribute groups, and identity constraints. Throughout this book, you will see the terms “element declaration” and “type definition,” but not “element definition” or “type declaration.”

The order of declarations and definitions in the schema document is insignificant. A declaration can refer to other declarations or definitions that appear before or after it, or even those that appear in another schema document.

2.2.2. Global vs. local components

Components can be declared (or defined) globally or locally. Global components appear at the top level of a schema document, and they are always named. Their names must be unique, within their component type, within the entire schema. For example, it is not legal to have two global element declarations with the same name in the same schema. However, it is legal to have an element declaration and a complex type definition with the same name.

Local components, on the other hand, are scoped to the definition or declaration that contains them. Element and attribute declarations can be local, which means their scope is the complex type in which they are declared. Simple types and complex types can also be locally defined, in which case they are anonymous and cannot be used by any element or attribute declaration other than the one in which they are defined.

2.3. Elements and attributes

Elements and attributes are the basic building blocks of XML documents. The instance in Example 2–1 contains three elements (product, number, and size) and one attribute (effDate). As a result, the schema contains three element declarations and one attribute declaration. The product element declaration is global, since it appears at the top level of the schema document. The other two element declarations, as well as the attribute declaration, are local, and their scope is the ProductType type in which they are declared. Elements and attributes are covered in detail in Chapters 6 and 7, respectively.

2.3.1. The tag/type distinction

Each of the elements and attributes is associated with a type. XML Schema separates the concepts of elements and attributes from their types. This allows using different names for data that is structurally the same. For example, you can write two element declarations, shippingAddress and billingAddress, which have the exact same structure but different names. You are only required to define one type, AddressType, and use it in both element declarations. In addition to using different names, you can place the corresponding elements in different places in the document. A shippingAddress element may only be relevant in the shipment information section of a purchase order, while a billingAddress may appear only in the billing section.

You can also have two element declarations with the same name, but different types, in different contexts. For example, a size element can contain an integer when it is a child of shirt, or a value S, M, or L when it is a child of hat.

2.4. Types

Types allow for validation of the content of elements and the values of attributes. They can be either simple types or complex types. The term “type” is used throughout this book to mean “simple or complex type.”

2.4.1. Simple vs. complex types

Elements that have been assigned simple types have character data content, but no child elements or attributes. Example 2–3 shows the size, comment, and availableSizes elements that have simple types.

By contrast, elements that have been assigned complex types may have child elements or attributes. Example 2–4 shows the size, comment, and availableSizes elements with complex types.

Example 2–3. Elements with simple types


<size>10</size>
<comment>Runs large.</comment>
<availableSizes>10 large 2</availableSizes>


Example 2–4. Elements with complex types


<size system="US-DRESS">10</size>
<comment>Runs <b>large</b>.</comment>
<availableSizes><size>10</size><size>2</size></availableSizes>


Attributes always have simple types, not complex types. This makes sense, because attributes themselves cannot have children or other attributes. Example 2–5 shows some attributes that have simple types.

Example 2–5. Attributes with simple types


system="US-DRESS"
availableSizes="10 large 2"


2.4.2. Named vs. anonymous types

Types can be either named or anonymous. Named types are always defined globally (at the top level of a schema document) and are required to have a unique name. Anonymous types, on the other hand, must not have names. They are always defined entirely within an element or attribute declaration, and may only be used once, by that declaration. The two types in Example 2–2 are both named types. An anonymous type is shown in Example 2–6.

Example 2–6. Anonymous type


<xs:element name="size">
  <xs:simpleType>
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="2"/>
      <xs:maxInclusive value="18"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>


2.4.3. The type definition hierarchy

XML Schema allows types to be derived from other types. In Example 2–2, the simple type SizeType is derived from the integer simple type. A complex type can also be derived from another type, either simple or complex. It can either restrict or extend the other type. For example, you could define a complex type UKAddressType that extends AddressType to add more children.

The derivation of types from other types forms a type definition hierarchy. Derived types are related to their ancestors and inherit qualities from them. They can also be substituted for each other in instances. If the shippingAddress element declaration refers to the type AddressType, a corresponding element can also have the type UKAddressType in the instance.

This is very powerful because applications designed to process generic AddressType elements can also process UKAddressType elements without caring about the differences. Other processors that do care about the differences between them can distinguish between the different types.

2.5. Simple types

2.5.1. Built-in simple types

Forty-nine simple types are built into the XML Schema recommendation. These simple types represent common data types such as strings, numbers, date and time values, and also include types for each of the valid attribute types in XML DTDs. The built-in types are summarized in Table 2–2 and discussed in detail in Chapter 11.

Table 2–2. Built-in simple type summary

Image

Example 2–2 assigned the built-in simple type integer to the number elements, and the built-in simple type date to the effDate attribute.

2.5.2. Restricting simple types

New simple types may be derived from other simple types by restricting them. Example 2–2 showed the definition of a simple type SizeType that restricts the built-in type integer. We applied the facets minInclusive and maxInclusive to restrict the valid values of the size elements to be between 2 and 18. Using the fourteen facets that are part of XML Schema, you can specify a valid range of values, constrain the length and precision of values, enumerate a list of valid values, or specify a regular expression that valid values must match. These fourteen facets are summarized in Table 2–3. Chapter 8 explains how to derive new simple types.

Table 2–3. Facets

Image

2.5.3. List and union types

Most simple types, including those we have seen so far, are atomic types. They contain values that are indivisible, such as 10. There are two other varieties of simple types: list and union types.

List types have values that are whitespace-separated lists of atomic values, such as <availableSizes>10 large 2</availableSizes>.

Union types may have values that are either atomic values or list values. What differentiates them is that the set of valid values, or “value space,” for the type is the union of the value spaces of two or more other simple types. For example, to represent a dress size, you may define a union type that allows a value to be either an integer from 2 through 18 or one of the string values small, medium, or large.

List and union types are covered in Chapter 10.

2.6. Complex types

2.6.1. Content types

The “content” of an element is the character data and child elements that are between its tags. There are four types of content for complex types: simple, element-only, mixed, and empty. The content type is independent of attributes; all of these content types allow attributes. Example 2–7 shows the instance elements size, product, letter, and color that have complex types. They represent the four different content types.

• The size element has simple content, because it contains only character data.

• The product element has element-only content, because it has child elements, but no character data content.

Example 2–7. Elements with complex types


<size system="US-DRESS">10</size>

<product>
  <number>557</number>
  <size>10</size>
</product>

<letter>Dear <custName>Priscilla Walmsley</custName>...</letter>

<color value="blue"/>


• The letter element has mixed content, because it has both child elements and character data content.

• The color element has empty content, because it does not have any content (just attributes).

2.6.2. Content models

The order and structure of the child elements of a complex type are known as its content model. Content models are defined using a combination of model groups, element declarations or references, and wildcards. In Example 2–2, the content model of ProductType was a single sequence model group containing two element declarations. There are three kinds of model groups:

sequence groups require that the child elements appear in the order specified.

choice groups allow any one of several child elements to appear.

all groups allow child elements to appear in any order.

These groups can be nested and may occur multiple times, allowing you to create sophisticated content models. Example 2–8 shows a more complex content model for ProductType. Instances of this new definition of ProductType must have a number child, optionally followed by up to three children which may be either size or color elements, followed by any one element from another namespace.

Example 2–8. More complicated content model


<xs:complexType name="ProductType">
  <xs:sequence>
    <xs:element name="number" type="xs:integer"/>
    <xs:choice minOccurs="0" maxOccurs="3">
      <xs:element name="size" type="SizeType"/>
      <xs:element name="color" type="ColorType"/>
    </xs:choice>
    <xs:any namespace="##other"/>
  </xs:sequence>
  <xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>


An any element is known as a wildcard, and it allows for open content models. There is an equivalent wildcard for attributes, anyAttribute, which allows any attribute to appear in a complex type.

2.6.3. Deriving complex types

Complex types may be derived from other types either by restriction or by extension.

Restriction, as the name suggests, restricts the valid contents of a type. The values for the new type are a subset of those for the base type. All values of the restricted type are also valid according to the base type.

Extension allows for adding additional child elements and/or attributes to a type, thus extending the contents of the type. Values of the base type are not necessarily valid for the extended type, since required elements or attributes may be added. Example 2–9 shows the definition of ShirtType that is a complex type extension. It adds another element declaration, color, and another attribute declaration, id, to ProductType. New element declarations or references may only be added to the end of a content model, so instances of ShirtType must have the children number, size, and color, in that order.

Example 2–9. Complex type extension


<xs:complexType name="ShirtType">
  <xs:complexContent>
    <xs:extension base="ProductType">
      <xs:sequence>
        <xs:element name="color" type="ColorType"/>
      </xs:sequence>
      <xs:attribute name="id" type="xs:ID" use="required"/>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>


2.7. Namespaces and XML Schema

Namespaces are an important part of XML Schema, and they are discussed in detail in Chapter 3. Example 2–10 shows our now-familiar schema, this time with a target namespace declared. Let’s take a closer look at the attributes of a schema element.

1. The namespace http://www.w3.org/2001/XMLSchema is mapped to the xs: prefix. This indicates that the elements used in the schema document itself, such as schema, element, and complexType, are part of the XML Schema namespace.

2. A target namespace, http://datypic.com/prod, is declared. Any schema document may have a target namespace, which applies to the global (and some local) components declared or defined in it. Although a schema document can only have one target namespace, multiple schema documents with different target namespaces can be assembled together to represent a schema.

Example 2–10. Product schema document with target namespace


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://datypic.com/prod"
           xmlns:prod="http://datypic.com/prod">

  <xs:element name="product" type="prod:ProductType"/>
  <xs:complexType name="ProductType">
    <xs:sequence>
      <xs:element name="number" type="xs:integer"/>
      <xs:element name="size" type="prod:SizeType"/>
    </xs:sequence>
    <xs:attribute name="effDate" type="xs:date"/>
  </xs:complexType>
  <xs:simpleType name="SizeType">
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="2"/>
      <xs:maxInclusive value="18"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>


3. The target namespace is mapped to the prod prefix.

Example 2–11 shows a new instance, where a namespace is declared. In order for an instance to be valid according to a schema, the namespace declaration in the instance must match the target namespace of the schema document.

Example 2–11. Instance with namespace


<prod:product xmlns:prod="http://datypic.com/prod"
              effDate="2001-04-12">
  <number>557</number>
  <size>10</size>
</prod:product>


In this case, only the product element has a prefixed name. This is because the other two elements and the attribute are declared locally. By default, locally declared components do not take on the target namespace. However, this can be overridden by specifying elementFormDefault and attributeFormDefault for the schema document. This is discussed in detail in Chapters 6 and 7.

2.8. Schema composition

An XSD schema is a set of components such as type definitions and element declarations. Example 2–2 showed a schema document that was used alone to validate an instance. It contained the declarations and definitions for all of the components of the schema.

However, a schema could also be represented by an assembly of schema documents. One way to compose them is through the include and import mechanisms. Include is used when the other schema document has the same target namespace as the “main” schema document. Import is used when the other schema document has a different target namespace. Example 2–12 shows how you might include and import other schema documents.

Example 2–12. Schema composition using include and import


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns="http://datypic.com/ord"
           targetNamespace="http://datypic.com/ord">

  <xs:include schemaLocation="moreOrderInfo.xsd"/>

  <xs:import namespace="http://datypic.com/prod"
              schemaLocation="productInfo.xsd"/>
  <!--...-->
</xs:schema>


The include and import mechanisms are not the only way for processors to assemble schema documents into a schema. Unfortunately, there is not always a “main” schema document that represents the whole schema. Instead, a processor might join schema documents from various predefined locations, or take multiple hints from the instance. See Chapter 4 for more information on schema composition.

2.9. Instances and schemas

A document that conforms to a schema is known as an instance. An instance can be validated against a particular schema, which may be made up of the schema components defined in multiple schema documents. A number of different ways exist for the schema documents to be located for a particular instance. One way is using the xsi:schemaLocation attribute. Example 2–13 shows an instance that uses the xsi:schemaLocation attribute to map a namespace to a particular schema document.

Using xsi:schemaLocation is not the only way to tell the processor where to find the schema. XML Schema is deliberately flexible on this topic, allowing processors to use different methods for choosing schema documents to validate a particular instance. These methods include built-in schemas, use of internal catalogs, use of the xsi:schemaLocation attribute, and dereferencing of namespaces. Chapter 5 covers the validation of instances in detail.

Example 2–13. Using xsi:schemaLocation


<prod:product xmlns:prod="http://datypic.com/prod"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http://datypic.com/prod prod.xsd"
              effDate="2001-04-12">
  <number>557</number>
  <size>10</size>
</prod:product>


2.10. Annotations

XML Schema provides many mechanisms for describing the structure of XML documents. However, it cannot express everything there is to know about an instance or the data it contains. For this reason, XML Schema allows annotations to be added to almost any schema component. These annotations can contain human-readable information (under documentation) or application information (under appinfo). Example 2–14 shows an annotation for the product element declaration. Annotations are covered in Chapter 21.

Example 2–14. Annotation


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:doc="http://datypic.com/doc">
  <xs:element name="product" type="ProductType">
    <xs:annotation>
      <xs:documentation xml:lang="en"
        source="http://datypic.com/prod.html#product">
        <doc:description>This element represents a product.
        </doc:description>
      </xs:documentation>
    </xs:annotation>
  </xs:element>
</xs:schema>


2.11. Advanced features

XML Schema has some more advanced features. These features are available if you need them, but are certainly not an integral part of every schema. Keep in mind that you are not required to use all of XML Schema. You should choose a subset that is appropriate for your needs.

2.11.1. Named groups

XML Schema provides the ability to define groups of element and attribute declarations that are reusable by many complex types. This facility promotes reuse of schema components and eases maintenance. Named model groups are fragments of content models, and attribute groups are bundles of related attributes that are commonly used together. Chapter 15 explains named groups.

2.11.2. Identity constraints

Identity constraints allow you to uniquely identify nodes in a document and ensure the integrity of references between them. They are similar to the primary and foreign keys in databases. They are described in detail in Chapter 17.

2.11.3. Substitution groups

Substitution groups are a flexible way to designate certain element declarations as substitutes for other element declarations in content models. If you have a group of related elements that may appear interchangeably in instances, you can reference the substitution group as a whole in content models. You can easily add new element declarations to the substitution groups, from other schema documents, and even other namespaces, without changing the original declarations in any way. Substitution groups are covered in Chapter 16.

2.11.4. Redefinition and overriding

Redefinition and overriding allow you to define a new version of a schema component while keeping the same name. This is useful for extending or creating a subset of an existing schema document, or overriding the definitions of components in a schema document. Redefinition and overriding are covered in Chapter 18.

image

2.11.5. Assertions

Assertions are XPath constraints on XML data, which allow complex validation above and beyond what can be specified in a content model. This is especially useful for co-constraints, where the values or existence of certain child elements or attributes affect the validity of other child elements or attributes. For example, “If the value of newCustomer is false, then customerID must appear.” Chapter 14 covers assertions in detail.

image
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.63.184