An
XML Schema document (XSD), like an XSLT stylesheet, is itself an XML
document. It may contain an XML declaration, and must contain a
namespace declaration for the URI
http://www.w3.org/2001/XMLSchema
. This namespace
is traditionally mapped to the prefix xs
. The
document element of an XSD is xs:schema
; the
simplest possible XSD, therefore, is the following:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" />
Of course, this XSD defines no structure, so it is mostly useless. To be more useful, it should include at least one element, representing the document element of the XML document it describes:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Customer" /> </xs:schema>
The xs:element
element is called a particle. A particle can be
thought of as representing a single unit of markup, or a grouping of
such units. Other particles include xs:attribute
,
xs:choice
, and xs:sequence
,
among others. xs:all
,
xs:sequence
and xs:choice
are
also compositors, elements that define groups of
particles.
A document using this schema would need to have the following content in order to be valid:
<Customer />
You may have already noticed that I’ve deviated from
the style used in earlier parts of this book by capitalizing the
first letter of the Customer
element. I’ll be
capitalizing the first letter of every element and attribute name in
this XSD. Hold that thought! I’ll explain the
different style in a little while.
Still not very useful, is it? Let’s add a little more to the XML Schema, customer.xsd:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0"> <xs:element name="Customer"> <xs:complexType> <xs:sequence> <xs:element name="Name" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The xs:complexType
schema element indicates that its enclosing
element’s content is more than just simple text; it
actually has structure. This can be thought of
as the real minimum requirement for using XML Schema, because a
schema for a document with an empty document element is not very
useful at all.
The xs:sequence
element contains an ordered list of elements. Other compositors
include xs:choice
, which indicates that any one of
the listed elements may appear, and xs:all
, which
indicates that the listed elements may appear in any order.
With xs:sequence
, I’ve now
defined a document structure that looks like this:
<Customer> <Name>Amalgamated Construction</name> </Customer>
In
order to constrain the number of times an element may appear in a
sequence, you can add the minOccurs
and
maxOccurs
attributes. Once you have done that, you
might as well define the type of data that appears in the
Name
element as well. The new schema looks like
this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" minOccurs="1" maxOccurs="1" type="xs:token"
/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Now you’re constrained to
exactly one Name
element, and its content may
consist of any valid XML token (a string with any whitespace
collapsed). By virtue of its data type constraint, this relatively
simple XSD is already more complex than anything that could have been
defined with a DTD.
The values of minOccurs
and
maxOccurs
both default to 1, so this change was
not strictly necessary, and I’ll omit them in the
rest of the examples if they have the default values. The value of
minOccurs
must be a nonnegative integer, while
maxOccurs
may be any nonnegative integer greater
than or equal to minOccurs
, or the literal string
“unbounded”.
The type
attribute can
take on any of quite a number of values, for predefined types. It can
also hold custom types, as you’ll see in a moment.
This schema is acceptable, but customers have more information that could appear in the XML document. Customers should also have a customer ID and an address:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0"> <xs:element name="Customer"> <xs:complexType> <xs:sequence> <xs:element name="Name" type="xs:token" /> <xs:element name="Address" maxOccurs="unbounded" type="xs:string" /> </xs:sequence> <xs:attribute name="Id" type="xs:ID" /> </xs:complexType> </xs:element> </xs:schema>
The document can now have one or more
Address
elements, containing data of type
xs:string
(that is, character data with whitespace
retained) to hold freeform address information. According to the
schema, the Address
elements must come after the
Name
element, because
xsd:sequence
constrains the order of elements.
I’ve also added an Id
attribute
to the Customer
element.
Id
’s value is of type
xs:ID
(it must contain only alphanumeric data or
the punctuation marks _
, -
,
., and :; must begin with a
non-numeric character; and must be unique amongst all attributes of
type xs:ID
in the document).
That Address
element is not quite right, though.
Although a freeform address may work well enough for many purposes,
it really doesn’t take proper advantage of
XML’s promise of structured data. Instead, a better
document structure would look like this:
<Customer id="customer.8873"> <Name>Amalgamated Construction</Name> <Address> <Street>81 San Leandro Blvd</Street> <Street>Suite 5D</Street> <City>Albequerque</City> <State>NM</State> <Zip>08765-9999</Zip> </Address> </Customer>
The XSD for this document could be the following:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0"> <xs:element name="Customer"> <xs:complexType> <xs:sequence> <xs:element name="Name" type="xs:token" /> <xs:element name="Address" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="Street" maxOccurs="3" type="xs:string" /> <xs:element name="City" type="xs:string" /> <xs:element name="State" type="xs:string" /> <xs:element name="Zip" type="USZipCodeType" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="Id" type="xs:ID"/> </xs:complexType> </xs:element> <xs:simpleType name="USZipCodeType"> <xs:restriction base="xs:token"> <xs:pattern value="d{5}(-d{4})?" /> </xs:restriction> </xs:simpleType> </xs:schema>
With these changes, Address
becomes an element
which must have from one to three Street
elements,
and exactly one each of City
,
State
, and Zip
elements. I also
added in a new twist by defining a simple type called
USZipCodeType
.
The xs:simpleType
element defines USZipCodeType
, a type that can be
used in multiple places within the XSD. In this case, the type
represents a United States zip code, which must be composed of either
five numerals, or five numerals followed by a hyphen and four
numerals; that is, nnnnn
or
nnnnn-nnnn
. This pattern is expressed by the
regular expression d{5}(-d{4})?
. The
xs:restriction
and xs:pattern
elements work together to restrict the value to a token that matches
the regular expression in the value
attribute.
XML Schema’s regular expression syntax is based on Perl regular expressions, with some minor differences. To learn more about regular expressions, see Mastering Regular Expressions, 2nd Edition (O’Reilly).
Clearly you can keep going with this pattern of adding elements and attributes until the document is perfectly modeled. To sound a familiar refrain, XML Schema can do a lot more than this; see Eric van der Vlist’s XML Schema (O’Reilly) to learn more.
3.137.174.23