Using W3C XML Schema

An XML Schema document (XSD), like an XSLT stylesheet, is itself an XML document. It may contain an XML declaration, and must contain a namespace declaration for the URI http://www.w3.org/2001/XMLSchema. This namespace is traditionally mapped to the prefix xs. The document element of an XSD is xs:schema; the simplest possible XSD, therefore, is the following:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" />

Of course, this XSD defines no structure, so it is mostly useless. To be more useful, it should include at least one element, representing the document element of the XML document it describes:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Customer" />
</xs:schema>

The xs:element element is called a particle. A particle can be thought of as representing a single unit of markup, or a grouping of such units. Other particles include xs:attribute, xs:choice, and xs:sequence, among others. xs:all, xs:sequence and xs:choice are also compositors, elements that define groups of particles.

A document using this schema would need to have the following content in order to be valid:

<Customer />

Tip

You may have already noticed that I’ve deviated from the style used in earlier parts of this book by capitalizing the first letter of the Customer element. I’ll be capitalizing the first letter of every element and attribute name in this XSD. Hold that thought! I’ll explain the different style in a little while.

Still not very useful, is it? Let’s add a little more to the XML Schema, customer.xsd:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
  <xs:element name="Customer">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Name" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

The xs:complexType schema element indicates that its enclosing element’s content is more than just simple text; it actually has structure. This can be thought of as the real minimum requirement for using XML Schema, because a schema for a document with an empty document element is not very useful at all.

The xs:sequence element contains an ordered list of elements. Other compositors include xs:choice, which indicates that any one of the listed elements may appear, and xs:all, which indicates that the listed elements may appear in any order.

With xs:sequence, I’ve now defined a document structure that looks like this:

<Customer>
  <Name>Amalgamated Construction</name>
</Customer>

In order to constrain the number of times an element may appear in a sequence, you can add the minOccurs and maxOccurs attributes. Once you have done that, you might as well define the type of data that appears in the Name element as well. The new schema looks like this:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
  <xs:element name="Customer">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Name" minOccurs="1" maxOccurs="1" type="xs:token" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Now you’re constrained to exactly one Name element, and its content may consist of any valid XML token (a string with any whitespace collapsed). By virtue of its data type constraint, this relatively simple XSD is already more complex than anything that could have been defined with a DTD.

The values of minOccurs and maxOccurs both default to 1, so this change was not strictly necessary, and I’ll omit them in the rest of the examples if they have the default values. The value of minOccurs must be a nonnegative integer, while maxOccurs may be any nonnegative integer greater than or equal to minOccurs, or the literal string “unbounded”.

The type attribute can take on any of quite a number of values, for predefined types. It can also hold custom types, as you’ll see in a moment.

This schema is acceptable, but customers have more information that could appear in the XML document. Customers should also have a customer ID and an address:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
  <xs:element name="Customer">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Name" type="xs:token" />
        <xs:element name="Address" maxOccurs="unbounded" type="xs:string" />
      </xs:sequence>
      <xs:attribute name="Id" type="xs:ID" />
    </xs:complexType>
  </xs:element>
</xs:schema>

The document can now have one or more Address elements, containing data of type xs:string (that is, character data with whitespace retained) to hold freeform address information. According to the schema, the Address elements must come after the Name element, because xsd:sequence constrains the order of elements. I’ve also added an Id attribute to the Customer element. Id’s value is of type xs:ID (it must contain only alphanumeric data or the punctuation marks _, -, ., and :; must begin with a non-numeric character; and must be unique amongst all attributes of type xs:ID in the document).

That Address element is not quite right, though. Although a freeform address may work well enough for many purposes, it really doesn’t take proper advantage of XML’s promise of structured data. Instead, a better document structure would look like this:

<Customer id="customer.8873">
  <Name>Amalgamated Construction</Name>
  <Address>
    <Street>81 San Leandro Blvd</Street>
    <Street>Suite 5D</Street>
    <City>Albequerque</City>
    <State>NM</State>
    <Zip>08765-9999</Zip>
  </Address>
</Customer>

The XSD for this document could be the following:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
  <xs:element name="Customer">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Name" type="xs:token" />
        <xs:element name="Address" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Street" maxOccurs="3" type="xs:string" />
              <xs:element name="City" type="xs:string" />
              <xs:element name="State" type="xs:string" />
              <xs:element name="Zip" type="USZipCodeType" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="Id" type="xs:ID"/>
    </xs:complexType>
  </xs:element>

  <xs:simpleType name="USZipCodeType">
    <xs:restriction base="xs:token">
      <xs:pattern value="d{5}(-d{4})?" />
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

With these changes, Address becomes an element which must have from one to three Street elements, and exactly one each of City, State, and Zip elements. I also added in a new twist by defining a simple type called USZipCodeType.

The xs:simpleType element defines USZipCodeType, a type that can be used in multiple places within the XSD. In this case, the type represents a United States zip code, which must be composed of either five numerals, or five numerals followed by a hyphen and four numerals; that is, nnnnn or nnnnn-nnnn. This pattern is expressed by the regular expression d{5}(-d{4})?. The xs:restriction and xs:pattern elements work together to restrict the value to a token that matches the regular expression in the value attribute.

Tip

XML Schema’s regular expression syntax is based on Perl regular expressions, with some minor differences. To learn more about regular expressions, see Mastering Regular Expressions, 2nd Edition (O’Reilly).

Clearly you can keep going with this pattern of adding elements and attributes until the document is perfectly modeled. To sound a familiar refrain, XML Schema can do a lot more than this; see Eric van der Vlist’s XML Schema (O’Reilly) to learn more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.174.23