Better Validation with Schemas

The DTD mechanism that was included in the XML 1.0 recommendation allows fairly sophisticated document structures to be declared. But when programmers started to apply XML to rigorous data transfer applications (such as importing and exporting data from a relational database), the limitations of DTDs became quite obvious. DTDs cannot restrict what types of character data can be stored in an element, allowing anomalies such as this:

<phone-number>[email protected]</phone-number>

The XML Schema recommendation (which was finally approved on May 2, 2001) is intended to address the shortcomings of the XML DTD and provide capabilities for very strict document content validation. The remainder of this chapter attempts to give a very quick introduction to the concepts and facilities provided by XML Schemas. The full XML Schema specification is actually much longer and more complex than the XML recommendation itself, so if you plan to begin seriously developing schema documents, you should invest in a good schema reference such as Sams XML Schema Development: An Object-Oriented Approach.

Schema Overview

Interestingly enough, XML Schemas are actual standalone XML documents in their own right. The XML Schema language is a specialized XML application that is designed to describe allowable content in another XML document. Schema-enabled parsers read an XML document, read the associated schema document, and then compare the contents of the target document with the descriptions in the schema.

To illustrate how schemas differ from DTDs, Listing 2.5 shows a sample schema ( restaurant.xsd) that duplicates (and extends) the functionality of the DTD located in restaurant.dtd.

Listing 2.5. A Restaurant Sample Schema
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="restaurant">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="name"/>
        <xsd:element name="menu" type="menuType" minOccurs="1"
            maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <xsd:complexType name="menuType">
    <xsd:sequence>
      <xsd:element name="items" type="itemList"/>
    </xsd:sequence>
    <xsd:attribute name="type" type="mealType" use="required"/>
    <xsd:attribute name="start-time" type="xsd:dateTime" use="required"/>
    <xsd:attribute name="end-time" type="xsd:time" use="required"/>
  </xsd:complexType>
  <xsd:simpleType name="mealType">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="Breakfast"/>
      <xsd:enumeration value="Brunch"/>
      <xsd:enumeration value="Lunch"/>
      <xsd:enumeration value="Dinner"/>
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:complexType name="itemList">
    <xsd:choice maxOccurs="unbounded">
      <xsd:element ref="item"/>
      <xsd:element ref="combo"/>
    </xsd:choice>
  </xsd:complexType>

  <xsd:element name="item">
    <xsd:complexType>
      <xsd:sequence minOccurs="0">
        <xsd:element ref="name"/>
        <xsd:element ref="price"/>
        <xsd:element ref="note" minOccurs="0"/>
      </xsd:sequence>

      <xsd:attribute name="id" type="xsd:ID"/>
      <xsd:attribute name="ref" type="xsd:IDREF"/>
    </xsd:complexType>
  </xsd:element>

  <xsd:element name="combo">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="name"/>
        <xsd:element ref="item" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
      <xsd:attribute name="id" type="xsd:ID" use="required"/>
    </xsd:complexType>
  </xsd:element>

  <xsd:element name="name" type="xsd:string"/>

  <xsd:element name="note" type="xsd:string"/>

  <xsd:element name="price" type="currency"/>
  <xsd:simpleType name="currency">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="$ d* . d d"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>
						

Schema Elements

The schema in Listing 2.5 shows examples of various elements that make up the XML Schema language. The following sections give a brief explanation of each element and what it is used for.

Note

Note that each of the following elements belongs to the schema namespace (http://www.w3.org/2001/XMLSchema), which is traditionally assigned to the : namespace prefix.


<xsd:schema>

The <xsd:schema> element is always the top-level element of a valid XML Schema. It may contain top-level element and type declarations. It may also contain top-level attribute, group, and notation declarations, but this advanced usage is not covered here.

<xsd:element>

The <xsd:element> element is used to declare a concrete XML element that may appear in a document. In the preceding example, the first <xsd:element> markup that appears declares the top-level <restaurant> element. There are also top-level declarations for <item>, <combo>, <name>, <note>, and <price> elements.

Note

Although this schema is intended to validate a document that contains a top-level <restaurant> element, legally any element declared in a top-level <xsd:element> can appear as the single, top-level element:

<name xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="restaurant.xsd">SNL Diner</name>


The name attribute indicates the name of the element being declared. It may then contain additional markup (such as a complexType or simpleType element) that indicates what type of content it may contain. It is also possible to reuse type and element declarations through the use of the type attributes. In the declaration for the <menu> element, for example, the type attribute is used to point to the menuType complex type declaration:

<xsd:element name="menu" type="menuType" minOccurs="1"
            maxOccurs="unbounded"/>

The <xsd:element> element is also used within <xsd:complexType> declarations to indicate what type of sub-elements an element can contain (for example, the top-level <item> element declaration in the example).

<xsd:simpleType>

In schema terminology, a simple type is used to declare an element that does not contain other elements. Similar to most programming languages, the schema specification provides several built-in simple types, such as string, integer, boolean, and dateTime. These simple types can also be used to construct user-defined extended types (such as declaring restrictions on an integer value that forces it to be greater than 0 and less than 1000).

<xsd:complexType>

Unlike the <xsd:simpleType> element, the <xsd:complexType> element is used to declare elements that may contain other elements. If a complex type declaration appears at the top level (as a child of the <xsd:schema> element), it must have a name attribute. It can then be referred to by an element declaration that uses the type attribute.

It is also possible to declare anonymous complex types. The <xsd:element> markup that declares the <restaurant> element, for instance, uses an unnamed complexType element to declare a sequence of child elements that it must contain.

<xsd:sequence>

In some cases, it is necessary to declare that a list of elements must appear in a particular order. The elements declared in an <xsd:sequence> must appear in the order given (subject to minOccurs and maxOccurs values, explained later).

For example, the <xsd:sequence> element within the <item> element declaration is equivalent to the <!ELEMENT> declaration in the basic restaurant DTD:

<!ELEMENT item (name, price, note?)?>

The sequence is implied by the order of the <xsd:element> elements, and the minOccurs attribute is used to simulate the ? DTD syntax.

<xsd:attribute>

Like the <!ATTLIST> DTD markup, the <xsd:attribute> element is used to declare the names and valid content of element attributes. In addition to the special attribute types defined in XML 1.0 (such as ID and IDREF), schema attributes can be declared to contain any built-in or user-defined simple type (string, integer, and so on).

The attribute declarations contained in the menuType complex type declaration use type and use attributes to indicate what values they may contain and whether the attribute is required. By default (if no use attribute is given), all attributes are optional.

<xsd:restriction>

The XML Schema language allows types (both simple and complex) to be extended to create new types. The <xsd:restriction> element indicates what type is being extended (using the base attribute) and what additional restrictions are being imposed on the base type to create the new type.

Note

The various types of restrictions that can be applied are called facets. A complete list of facets and how and where they can be applied can be found in the schema specification itself. See this book's Web site for links to the schema specification.


<xsd:pattern>

The top-level simple type declaration for the currency type extends the built-in xsd:string type to restrict it to strings that reflect dollar amounts. Within the restriction element, a pattern element is used to provide a simple regular expression:

<xsd:pattern value="$ d* . d d"/>

This expression indicates that any value that matches the currency type must consist of a dollar sign ($) followed by zero or more digits, a decimal point, and two digits to the right of the decimal point. The full schema regular expression syntax is similar to the Perl regular expression syntax. For a complete reference see the schema specification itself.

<xsd:enumeration>

Another restriction that can be placed on element or attribute content is that the value must appear in an arbitrary list of valid values. Multiple <xsd:enumeration> elements may be used within an <xsd:restriction> element to provide the permitted values. The mealType simple type declaration declares a string-based value that must contain one of the strings "Breakfast", "Brunch", "Lunch", or "Dinner".

<xsd:choice>

The choice element is equivalent in some ways to the | operator in the basic DTD element declaration syntax. It indicates that one of the elements within the <xsd:choice> block may appear at that point in the target document. Based on the values of the minOccurs and maxOccurs attributes, however, multiple elements from the list may occur one after another in the target document. The markup for the itemList complex type uses the choice element with maxOccurs="unbounded" to permit any number of <item> and <combo> elements to appear in any order in the target document.

minOccurs and maxOccurs Attributes

Throughout the sample schema in Listing 2.5, the minOccurs and maxOccurs attributes are used to control how many times a particular element, sequence, or choice of elements may appear. If these attributes are not present on a particular element, the implied values are 1 and 1 (meaning that the given structure must appear one and only one time). By setting these values to a positive integer or zero, the number of times certain document structures may appear can be controlled. Besides integers, the maxOccurs attribute may contain the special value unbounded, which means that an unlimited number of occurrences may appear.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.63