Chapter 3. The W3C XML Schema Language

SOAP, WSDL, and UDDI are markup languages defined using the W3C XML Schema Language, so understanding the latter is critical to understanding J2EE Web Services. This chapter will provide you with a good understanding of both W3C XML Schema Language basics and, optionally, advanced concepts, so that you are ready to learn about SOAP, WSDL, and the UDDI standards covered later.

Throughout this chapter the term XML schema will be used to refer to the W3C XML Schema Language as a technology, while the word schema by itself will refer to a specific XML schema document.

XML Schema Basics

The XML specification includes the Document Type Definition (DTD), which can be used to describe XML markup languages and to validate instances of them (XML documents). While DTDs have proven very useful over the years, they are also limited. To address limitations of DTDs, the W3C (World Wide Web Consortium), which manages the fundamental XML standards, created a new way to describe markup languages called XML schema.

Why XML Schema Is Preferred to DTDs in Web Services

DTDs have done an adequate job of telling us how elements and attributes are organized in a markup language, but they fail to address data typing.

For example, the DTD in Listing 3-1 describes the valid organization of the Address Markup Language we created earlier. The DTD declares that an address element may contain one or more street elements and must contain exactly one of each of the city, state, and zip elements. It also declares that the address element must have a category attribute.

Example 3-1. A DTD

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT address (street+, city, state, zip)>
<!ELEMENT street (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ATTLIST address category CDATA #REQUIRED >

A parser reading an XML instance determines whether it's valid by comparing it to its DTD—if it declares that it uses a DTD. To be valid, an XML instance must conform to its DTD, which means it must use the elements specified by the DTD in the correct order and multiplicity (zero, one, or many times).

While constraints provided by DTDs are useful for validating XML instances, the probability that an XML instance will have a valid organization but contain invalid data is pretty high. DTDs have a very weak typing system that restricts elements to four broad types of data: EMPTY, ANY, element content, or mixed element-and-text content. In other words, DTDs can only restrict elements to containing nothing, other elements, or text—not a very granular typing system. DTDs don't support types like integer, decimal, boolean, and enumeration. For example, the Address Markup DTD cannot restrict the contents of the zip element to an integer value or the state element to a set of valid state codes.

XML schema, by contrast, provides a much stronger type system. Many believe that XML schema is superior to DTD because it defines a richer type system, which includes simple primitives (integer, double, boolean, among others) as well as facilities for more complex types. XML schema facilitates type inheritance, which allows simple or complex types to be extended or restricted to create new types. In addition, XML schema supports the use of XML namespaces to create compound documents composed of multiple markup languages.

Appendix A explains XML DTDs, but understanding the DTD schema language is not necessary for this book.

The XML Schema Document

A schema describes an XML markup language. Specifically it defines which elements and attributes are used in a markup language, how they are ordered and nested, and what their data types are.

A schema describes the structure of an XML document in terms of complex types and simple types. Complex types describe how elements are organized and nested. Simple types are the primitive data types contained by elements and attributes. For example, Listing 3-2 shows a portion of a schema that describes the Monson-Haefel Markup Language. Monson-Haefel Markup defines a set of XML schema types used by Monson-Haefel Books: USAddress, PurchaseOrder, Invoice, Shipping, and the like. At this point all the different types used by Monson-Haefel Books are combined into one schema; later you'll learn how to separate them into their own schemas and independent markup languages.

Example 3-2. The Address Definition in a Schema

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
 targetNamespace="http://www.Monson-Haefel.com/jwsbook">

    <element name="address" type="mh:USAddress" />

    <complexType name="USAddress">
      <sequence>
        <element name="name"    type="string" />
        <element name="street"  type="string" />
        <element name="city"    type="string" />
        <element name="state"   type="string" />
        <element name="zip"     type="string" />
      </sequence>
    </complexType>
    ...
</schema>

The first thing you may have noticed is that Listing 3-2 is actually an XML document. That schemas are XML documents is a critical point: It makes the development of validating parsers and other software tools easier, because the operations that manipulate schemas can be based on XML parsers, which are already widely available. DTDs, the predecessor to schemas, were not based on XML, so processing them required special parsing.

The root element of a schema document is always the schema element. Nested within the schema element are element and type declarations. Listing 3-2 declares a complex type named USAddress, and an element of that type named address.

The schema element assigns the XML schema namespace ("http://www.w3.org/2001/XMLSchema") as the default namespace. This namespace is the standard namespace defined by the XML schema specification—all the XML schema elements must belong to this namespace. The schema element also defines the targetNamespace attribute, which declares the XML namespace of all new types explicitly created within the schema. For example, the USAddress type is automatically assigned to targetNamespace, "http://www.Monson-Haefel.com/jwsbook".

The schema element also uses an XML namespace declaration to assign the prefix mh to the targetNamespace. Subsequently, newly created types in the schema can be referred to as "mh:Typename". For example, the type attribute in the element declaration in Listing 3-2 refers to the USAddress as "mh:USAddress":

<element name="address" type="mh:USAddress" />

An instance document based on this schema would use the address element directly or refer to the USAddress type. When a parser that supports XML schema reads the document, it can validate the contents of the XML document against the USAddress type definition in Listing 3-2. Listing 3-3 shows a conforming XML instance.

Example 3-3. An Instance of the Address Markup Language

<?xml version="1.0" encoding="UTF-8"?>
<addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook">
   <name>Amazon.com</name>
   <street>1516 2nd Ave</street>
   <city>Seattle</city>
   <state>WA</state>
   <zip>90952</zip>
</addr:address>

Using XML schema, we can state exactly how an instance of the address element should be organized and the types of data its elements and attributes should contain.

Simple Types

A simple type resembles a Java primitive type in that both are atomic; they cannot be broken down into constituent parts. In other words, a simple element type will not contain other elements; it will contain only data. The XML schema specification defines many standard simple types, called built-in types. The built-in types are the standard building blocks of an XML schema document. They are members of the XML schema namespace, "http://www.w3.org/2001/XMLSchema".

Table 3-1. Comparing the Use of XML Schema Simple Types and Java Primitive Types

XML Schema Built-in Simple Types (shown in bold)

Java Primitive Types (shown in bold)

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
  targetNamespace="http://www.Monson-Haefel.com
Comparing the Use of XML Schema Simple Types and Java Primitive Types/jwsbook">
  ...
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName"   type="string" />
      <element name="accountNumber" type="integer" />
      <element name="total"         type="float" />
      <!-- More stuff follows -->
    </sequence>
  </complexType>
  ...
</schema>


package com.monsonhaefel.jwsbook;

public class PurchaseOrder {

  String accountName;
  int accountNumber;
  float total;
  // more stuff follows

}

The PurchaseOrder complex type declares three of its elements and an attribute using the XML schema built-in types: string, integer, and float. These simple types are similar to familiar types in the Java programming language and others. In a schema, simple types are used to construct complex types, much as Java primitives are used as fields of Java class definitions. Table 3-1 provides a comparison. The next section explains complex types in more detail.

The XML schema specification describes its 44 built-in simple types in precise detail. This precision enables XML parsers to process the built-in types predictably and consistently, for the most part, and provides a solid foundation for creating your own complex and custom simple types.

For example, the XML schema specification tells us that a string is defined as an unlimited length of characters based on the Universal Character Set;[1] an unsignedShort is a non-decimal number between 0 and 65,535; a float is a 32-bit floating-point type; and a date is represented as YYYY-MM-DD.

You can find complete and concise definitions of all the built-in types in XML Schema Part 2: Datatypes.[2] Table 3-2 provides a partial list, with brief definitions in plain English.

Table 3-2. A Subset of the XML Schema Built-in Simple Types

Simple Type

Definition

string

A sequence of characters conforming to UCS

normalizedString

A string without carriage returns, line feeds, or tabs

token

A string without spaces, line feeds, or tabs

NMTOKEN

A token used in attributes

byte

A non-decimal number between –128 and 127

unsignedByte

A non-decimal number between 0 and 255

base64Binary

Base64-encoded binary data (RFC 2045)[a]

hexBinary

Hex-encoded binary data[b]

integer

A base-10-integer number of any size (…)[c]

positiveInteger

A base-10 integer greater then zero (1, 2, …)

negativeInteger

A base-10 integer less then zero (…, –2, –1)

int

A base-10 integer between –2,147,483,648 and 2,147,483,647 (–2 billion and 2 billion)

unsignedInt

A base-10 integer between 0 and 4,294,967,295 (zero and 4 billion)

long

A base-10 integer between –9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 (–9 quintillion and 9 quintillion)

unsignedLong

A base-10 integer between 0 and 18,446,744,073,709,551,615 (zero and 18 quintillion)

short

A base-10 integer between –32,767 and 32,767

unsignedShort

A base-10 integer between 0 and 65,535

decimal

A decimal number of any precision and size

float

A decimal number conforming to the IEEE single-precision 32-bit floating-point type[d]

double

A decimal number conforming to the IEEE double-precision 64-bit floating-point type[d]

boolean

A boolean value of "true" or "false"

 

You can also use the values of "0" (false) or "1" (true); either convention is fine.

time

A time in hours, minutes, seconds, and milliseconds formatted as hh:mm:ss.sss (e.g., 1:20 PM is 13:20:00)

 

You may include the optional Coordinated Universal Time (UTC) designator (e.g., 1:20 PM Eastern Standard Time (EST) is 13:20:00-05:00)[e]

date

A Gregorian date in centuries, years, months, and days (e.g., December 31, 2004 is 2004-12-31)[e]

dateTime

A Gregorian date measured in centuries, years, months, and days, with a time field set off by a T (e.g., 1:20 PM EST on December 31, 2004 would be 2004-12-31T13:20:00-05:00)[e]

duration

A span of time measured in years, months, days, and seconds (e.g., 1 year, 2 months, 3 days, 10 hours, and 30 minutes would be P1Y2M3DT10H30M)

 

Duration may be negative, and zero values can be left off (e.g., 120 days earlier is P120D). The value must always start with the letter P.[f]

[a] N. Freed and N. Borenstein, “RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies” (1996). Available at http://www.ietf.org/rfc/rfc2045.txt.

[b] A very good explanation of the hexadecimal numbering system can be found at http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/ch01/CH01-2.html#HEADING2-1.

[c] Computers can't actually support infinite numbers, so the XML schema specification requires that the parser must support at least 18 digits, which is a pretty huge number.

[d] Institute of Electrical and Electronics Engineers, “IEEE Standard for Binary Floating-Point Arithmetic”. See http://standards.ieee.org/reading/ieee/std_public/description/busarch/754-1985_desc.html.

[e] International Organization for Standardization (ISO). “Representations of dates and times” (1988).

[f] The duration type is defined in the XML schema specification and is not based on ISO's “Representations of dates and times”.

All built-in simple and complex types are ultimately derived from anyType, which is the ultimate base type, like the Object class in Java. The XML Schema Part 2: Datatypes specification offers a diagram of the data type hierarchy; see Figure 3-1 on the next page.

XML Schema Type Hierarchy

Figure 3-1. XML Schema Type Hierarchy

Complex Types

A schema may declare complex types, which define how elements that contain other elements are organized. The USAddress schema type in Listing 3-2, for example, is a complex type definition for a United States postal address. It tells us that an element based on this type will contain five other elements called name, street, city, state, and zip.

A complex type is analogous to a Java class definition with fields but no methods. The fields in a Java class declare the names and types of variables that an instance of that class will contain. Similarly, a complex type declares the names and types of elements and attributes that an XML instance of that type may contain. An instance of a complex type is an element in an XML document. Table 3-3 compares an XML schema type and a Java class definition for a U.S. address.

Table 3-3. Comparing XML Schema Complex Types to Java Class Definitions

XML Schema: Complex Type

Java Class Definition

<complexType name="USAddress">
   <sequence>
     <element name="name" type="string" />
     <element name="street" type="string" />
     <element name="city" type="string" />
     <element name="state" type="string" />
     <element name="zip" type="string" />
   </sequence>
</complexType>
public class USAddress {
   public String name;
   public String street;
   public String city;
   public String state;
   public String zip;
}

While this analogy between XML schema complex types and Java class definitions is helpful, take care not to confuse them. A schema is used to define elements and attributes in a markup language and verify the correctness of an XML instance; it's not a computer program.

Sequences of Elements

Most complexType declarations in schemas will contain a sequence element that lists one or more element definitions. The element definitions tell you which elements are nested in the type, the order in which they appear, and the kind of data each element contains.

The USAddress type clearly defines the proper structure of a U.S. postal address and can be used to verify the proper contents of any element based on that type. For example, the address element used throughout Chapter 2 could be an instance of the type USAddress, and we could use that type to verify the contents of the address element when it was used in an XML instance. Table 3-4 shows the USAddress type alongside the address element so you can see how a complex type definition maps to an XML instance.

A complex type may contain a sequence of elements that are simple types or other complex types. For example, we can define an element for a purchase-order document by adding a PurchaseOrder type to the Monson-Haefel Markup Language you saw in Listing 3-2. In Listing 3-4, the new PurchaseOrder type has two nested elements, billAddress and shipAddress, both of type USAddress.

Example 3-4. The PurchaseOrder Type in a Schema

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
 targetNamespace="http://www.Monson-Haefel.com/jwsbook" >

<element name="purchaseOrder" type="mh:PurchaseOrder" />
<element name="address" type="mh:USAddress" />

<complexType name="PurchaseOrder">
   <sequence>
      <element name="accountName"   type="string" />
      <element name="accountNumber" type="unsignedShort" />
      <element name="shipAddress"   type="mh:USAddress" />
      <element name="billAddress"   type="mh:USAddress" />
      <element name="book"          type="mh:Book" />
      <element name="total"         type="float" />
   </sequence>
</complexType>

<complexType name="USAddress">
   <sequence>
      <element name="name"    type="string" />
      <element name="street"  type="string" />
      <element name="city"    type="string" />
      <element name="state"   type="string" />
      <element name="zip"     type="string" />
   </sequence>
</complexType>

Table 3-4. Mapping a Schema Complex Type to an XML Element

XML Schema: USAddress

XML Document: address

<complexType name="USAddress">
   <sequence>
      <element name="name" type="string" />
      <element name="street" type="string" />
      <element name="city" type="string" />
      <element name="state" type="string" />
      <element name="zip" type="string" />
   </sequence>
</complexType>
<address>
   <name>Amazon.com</name>
   <street>1516 2nd Ave</street>
   <city>Seattle</city>
   <state>WA</state>
   <zip>90952</zip>
</address>
<complexType name="Book">
   <sequence>
     <element name="title"           type="string" />
     <element name="quantity"        type="unsignedShort" />
     <element name="wholesale-price" type="float" />
   </sequence>
</complexType>

</schema>

The schema makes use of both complex types (PurchaseOrder, USAddress, and Book) and simple types (string, unsignedShort, and float).

The USAddress type is a member of the targetNamespace, so we refer to it by its fully qualified name, "mh:USAddress". (Recall that targetNamespace is assigned the namespace prefix mh in the schema element.)

As you can see, the PurchaseOrder type takes full advantage of USAddress by using it to define both its billAddress and shipAddress elements. In this way, complex type declarations can build on other complex type definitions to create rich types that easily describe very complex XML structures. The PurchaseOrder type also uses Book, another complex type that describes the book being ordered.

The names of XML schema types are case-sensitive. When an element declares that it is of a particular type, it must specify both the namespace and the name of that type exactly as the type declares them.

Attributes

In addition to sequences of elements, a complex type may also define its own attributes. For example, Listing 3-5 shows a new version of the PurchaseOrder type that includes the definition of an orderDate attribute.

Example 3-5. Adding an Attribute to a Complex Type

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schema
    xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
    targetNamespace="http://www.Monson-Haefel.com/jwsbook">

  <element name="purchaseOrder" type="mh:PurchaseOrder"/>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:USAddress"/>
      <element name="billAddress" type="mh:USAddress"/>
      <element name="book" type="mh:Book"/>
      <element name="total" type="float"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
  <complexType name="USAddress">
    <sequence>
      <element name="name" type="string"/>
      <element name="street" type="string"/>
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>
  <complexType name="Book">
    <sequence>
      <element name="title" type="string"/>
      <element name="quantity" type="unsignedShort"/>
      <element name="wholesale-price" type="float"/>
    </sequence>
  </complexType>
</schema>

The next code sample, Listing 3-6, shows a valid XML document based on the PurchaseOrder type defined by the schema you saw in Listing 3-5. The XML document in Listing 3-6 would contain all the elements and the orderDate attribute as described by the PurchaseOrder complex type—and would be verifiable against that type.

Example 3-6. An Instance of the Schema in Listing 3-5

<?xml version="1.0" encoding="UTF-8"?>
<po:purchaseOrder orderDate="2003-09-22"
 xmlns:po="http://www.Monson-Haefel.com/jwsbook">
                 <accountName>Amazon.com</accountName>
                 <accountNumber>923</accountNumber>
                 <shipAddress>
                        <name>AMAZON.COM</name>
                        <street>1850 Mercer Drive</street>
                        <city>Lexington</city>
                        <state>KY</state>
                        <zip>40511</zip>
                  </shipAddress>
                  <billAddress>
                        <name>Amazon.com</name>
                        <street>1516 2nd Ave</street>
                        <city>Seattle</city>
                        <state>WA</state>
                        <zip>90952</zip>
                   </billAddress>
                   <book>
                        <title>J2EE Web Services</title>
                        <quantity>300</quantity>
                        <wholesale-price>24.99</wholesale-price>
                   </book>
 <total>8997.00</total>
</po:purchaseOrder>

Occurrence Constraints

The multiplicity of an element, the number of times it occurs in an instance document, is controlled by occurrence constraints, which are declared by the maxOccurs and minOccurs attributes. For example, we can enhance the USAddress complex type by placing occurrence constraints on the street element as shown in Listing 3-7.

Example 3-7. Using Occurrence Constraints

<complexType name="USAddress">
   <sequence>
      <element name="name"    type="string" />
      <element name="street"  type="string"
                              minOccurs="1" maxOccurs="2" />
      <element name="city"    type="string" />
      <element name="state"   type="string" />
      <element name="zip"     type="string" />
   </sequence>
</complexType>

The occurrence constraints specify that in any instance of USAddress the street element must be present at least once and at most twice. In other words, a USAddress can contain either one or two street elements. The default value for both maxOccurs and minOccurs is "1", so if these attributes are not specified the element must be present exactly once. Thus, by default, each USAddress must have exactly one name, city, state, and zip.

The minOccurs attribute may be "0", indicating that an element is optional, or any positive integer value that is less than or equal to the maxOccurs value. The maxOccurs value may be any positive integer greater than or equal to the min Occurs value.

minOccurs ≥ 0
maxOccurs ≥ minOccurs

You may also define a maxOccurs value to be "unbounded" to specify that the element may occur an unlimited number of times.

For example, suppose Monson-Haefel Books wants to avoid storing a billing address that is identical to the shipping address, and to allow customers to buy an unlimited number of books on a single order. We can redefine the PurchaseOrder type, setting the occurrence constraints on the billAddress and the book elements as highlighted in Listing 3-8.

Example 3-8. Using the "unbounded" Occurrence Value

<complexType name="PurchaseOrder">
   <sequence>
      <element name="accountName"   type="string" />
      <element name="accountNumber" type="unsignedShort" />
      <element name="shipAddress"   type="mh:USAddress" />
      <element name="billAddress"   type="mh:USAddress"
                                       minOccurs="0" />
      <element name="book"          type="mh:Book"
                                       maxOccurs="unbounded" />
      <element name="total"         type="float" />
   </sequence>
   <attribute name="orderDate" type="date" />
</complexType>

The billAddress element is now optional. It may occur at most once, because its maxOccurs value is "1" by default, but it may also be omitted because its minOccurs value is "0". The book element must be present at least once because the default value of minOccurs is "1", but it may be repeated many times because its maxOccurs is "unbounded".

Attributes also have occurrence constraints, but they are different from those of elements. Instead of maxOccurs and minOccurs, attribute types declare the use occurrence constraint, which may be "required", "optional", or "prohibited", indicating that the attribute must, may, or may not be used, respectively. The default is "optional". An attribute might be "prohibited" if you want to stop the use of a particular attribute, perhaps one that is inappropriate or no longer in use.

In PurchaseOrder we want to make the orderDate attribute mandatory, so Listing 3-9 sets its use occurrence constraint to "required".

Example 3-9. Declaring the use Value of an Attribute

<complexType name="PurchaseOrder">
   <sequence>
      <element name="accountName"   type="string" />
      <element name="accountNumber" type="unsignedShort" />
      <element name="shipAddress"   type="mh:USAddress" />
      <element name="billAddress"   type="mh:USAddress"
                                        minOccurs="0" />
      <element name="book"          type="mh:Book"
                                        maxOccurs="unbounded" />
      <element name="total"         type="float" />
   </sequence>
   <attribute name="orderDate" type="date" use="required" />
</complexType>

An attribute may also have a default value, to be assigned if no value is explicitly declared in the instance document. For example, the USAddress type may include an attribute called category that can have the value "business", "private", or "government". Almost all addresses used by Monson-Haefel Books are business addresses, so we set the default for the category attribute to "business" in Listing 3-10.

Example 3-10. Declaring the Default Value of an Attribute

<complexType name="USAddress">
   <sequence>
      <element name="name"    type="string" />
      <element name="street"  type="string" />
      <element name="city"    type="string" />
      <element name="state"   type="string" />
      <element name="zip"     type="string" />
   </sequence>
   <attribute name="category" type="string" default="business" />
</complexType>

The default attribute can be used only when the use attribute is "optional" (recall that "optional" is the default value for the use attribute). It wouldn't make sense to declare a default when the use is "required" or "prohibited". If the use attribute is "required", there is no need for a default because the attribute must appear in the instance document. If the use is "prohibited", the attribute's not allowed so there is no sense having a default value.

An attribute may also be declared fixed: A fixed value is assigned to the attribute no matter what value appears in the XML instance document. This feature is useful in rare situations where you want to force a particular attribute always to have the same value. For example, if a particular schema is assigned a version number, then that version number should be fixed for that schema (UDDI does this).

The all Element

Most of the time you'll base complex types on sequence elements, but occasionally you may want to use the all element. Unlike sequence, which defines the exact order of child elements, the XML schema all element allows the elements in it to appear in any order. Each element in an all group may occur once or not at all; no other multiplicity is allowed. In other words, minOccurs is always "0" and maxOccurs is always "1". Finally, only single elements may be used in an all group; it can't include other groupings like sequence or all. Listing 3-11 shows the schema for the address element using the all element grouping instead of sequence.

Example 3-11. Using the XML Schema all Element

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
            xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
            targetNamespace="http://www.Monson-Haefel.com/jwsbook" >
   ...
   <complexType name="USAddress">
      <all>
         <element name="name"    type="string" />
         <element name="street"  type="string" />
         <element name="city"    type="string" minOccurs="0"/>
         <element name="state"   type="string" minOccurs="0"/>
         <element name="zip"     type="string" />
      </all>
   </complexType>
   ...
</schema>

In Listing 3-11 the name, street, and zip elements must be present in the instance document, but the city and state elements may be absent. The elements can be in any order, but none of the elements may occur more than once. Listing 3-12 shows a valid instance of the USAddress type as defined using the all element in Listing 3-11.

Example 3-12. An Instance of the Schema in Listing 3-11

<?xml version="1.0" encoding="UTF-8"?>
<addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook" >
   <zip>90952</zip>
   <street>1516 2nd Ave</street>
   <name>Amazon.com</name>
</addr:address>

Notice the missing city and state elements and that the order of the elements is different from that in the type definition.

Declaring Global Elements in a Schema

In addition to declaring simple and complex types, a schema may also declare global elements, which XML instance documents can refer to directly. Global elements are declared as direct children of the schema element, rather than children of a complex type. For example, the following shows a portion of the schema defined in Listing 3-5, which declared the purchaseOrder element (shown in bold) to be global.

<?xml version="1.0" encoding="UTF-8"?>
<schema
    xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
    targetNamespace="http://www.Monson-Haefel.com/jwsbook">

  <element name="purchaseOrder" type="mh:PurchaseOrder"/>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:USAddress"/>
      <element name="billAddress" type="mh:USAddress"/>
      <element name="book" type="mh:Book"/>
      <element name="total" type="float"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
  ...
</schema>

An XML document based on Listing 3-5 can use the purchaseOrder element as in Listing 3-6.

<?xml version="1.0" encoding="UTF-8"?>
<po:purchaseOrder orderDate="2003-09-22"
 xmlns:addr="http://www.Monson-Haefel.com/jwsbook">

   <accountName>Amazon.com</accountName>
   <accountNumber>923</accountNumber>
   <shipAddress>
   ...

</po:purchaseOrder>

The root element of a valid XML document must have a corresponding global element declaration in the schema. A schema may define more than one global element. For example, we can modify the schema for Monson-Haefel Books so that it declares two global elements: purchaseOrder and address. Listing 3-13 illustrates.

Example 3-13. Defining Multiple Element Declarations

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schema
    xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
    targetNamespace="http://www.Monson-Haefel.com/jwsbook">

  <element name="address" type="mh:USAddress"/>
  <element name="purchaseOrder" type="mh:PurchaseOrder"/>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:USAddress"/>
      <element name="billAddress" type="mh:USAddress"/>
      <element name="book" type="mh:Book"/>
      <element name="total" type="float"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
  <complexType name="USAddress">
    <sequence>
      <element name="name" type="string"/>
      <element name="street" type="string"/>
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>
  ...
</schema>

The schema in Listing 3-13 allows you to create XML documents in which the purchaseOrder element is the root, but it also allows you to create XML documents in which the address element is the root. Listing 3-14 is an XML document that defines the address element as its root element and conforms to the schema in Listing 3-13.

Example 3-14. An Address Document Based on the Monson-Haefel Books Schema

<?xml version="1.0" encoding="UTF-8"?>
<addr:address
    xmlns:addr="http://www.Monson-Haefel.com/jwsbook">

  <name>AMAZON.COM</name>
  <street>1850 Mercer Drive</street>
  <city>Lexington</city>
  <state>KY</state>
  <zip>40511</zip>
</addr:address>

By declaring two different global elements in the Monson-Haefel Books schema (Listing 3-13), you effectively create two schema-verifiable markup languages, a Purchase Order Markup Language and a U.S. Address Markup Language. The implication here is that a single schema can be used to validate two—indeed many—different kinds of documents. XML schema also supports global attributes that can be referred to anywhere in the schema, and that provide a consistent attribute name and type across elements. An example of a standard global attribute is xml:lang, which any element can use to indicate the language used in an element's value ("es" for Spanish, "en" for English, and so on).

Local elements are those declared within the scope of a complex type. In Listing 3-13 all the elements, except for purchaseOrder and address, are local elements, because they are declared within one complex type or another. Similarly, orderDate is a local attribute. Table 3-5 illustrates.

Table 3-5. Global and Local Elements in Listing 3-13

Global Elements

Local Elements

purchaseOrder
address
accountName
accountNumber
shipAddress
billAddress
book
total
name
street
city
state
zip

In a nutshell, global elements and attributes are declared as direct children of the schema element, while local elements and attributes are not; they are the children of complex types.

Qualified and Unqualified Elements

In Section 2.2.2 you learned that elements can be qualified by a namespace, or unqualified; that is, that elements in an XML document may or may not require QName prefixes. Global elements and attributes must always be qualified, which means that in an XML instance you must prefix them to form a QName. The exception is when a global element is a member of the default namespace, in which case it does not have to be qualified with a prefix—all unqualified elements are assumed to be part of the default namespace. The default namespace does not apply to global attributes; global attributes must always be prefixed.

While global elements and attributes must always be qualified, local elements may not need to be qualified. XML schema defines two attributes, elementsForm Default and attributesFormDefault, that determine whether local elements in an XML instance need to be qualified with a prefix or not. For example, the schema for the Address Markup Language can be modified to require namespace prefixes on all local elements in an XML instance, as in Listing 3-15.

Example 3-15. Declaring That Elements Must Be Namespace-Qualified

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
 targetNamespace="http://www.Monson-Haefel.com/jwsbook"
 elementFormDefault="qualified" >

    <element name="address" type="mh:USAddress" />

    <complexType name="USAddress">
      <sequence>
        <element name="name"    type="string" />
        <element name="street"  type="string" />
        <element name="city"    type="string" />
        <element name="state"   type="string" />
        <element name="zip"     type="string" />
      </sequence>
    </complexType>
    ...
</schema>

When the elementFormDefault attribute is set to "qualified", in any XML instance all the local elements in the targetNamespace must be qualified with a prefix. For example, Listing 3-16 shows an XML instance that conforms to the schema in Listing 3-15.

Example 3-16. Qualified Local Elements in an XML Document

<?xml version="1.0" encoding="UTF-8"?>
<addr:address
 xmlns:addr="http://www.Monson-Haefel.com/jwsbook" >

  <addr:name>AMAZON.COM</addr:name>
  <addr:street>1850 Mercer Drive</addr:street>
  <addr:city>Lexington</addr:city>
  <addr:state>KY</addr:state>
  <addr:zip>40511</addr:zip>
</addr:address>

If, on the other hand, the value for formElementDefault is "unqualified", only the global elements must be qualified. Listing 3-17 represents a valid XML instance when formElementDefault is "unqualified". Notice that the address element is qualified with the addr prefix, but the local elements (name, street, city, state, and zip) are not.

Example 3-17. Unqualified Local Elements in an XML Document

<?xml version="1.0" encoding="UTF-8"?>
<addr:address
 xmlns:addr="http://www.Monson-Haefel.com/jwsbook" >

  <name>AMAZON.COM</name>
  <street>1850 Mercer Drive</street>
  <city>Lexington</city>
  <state>KY</state>
  <zip>40511</zip>
</addr:address>

The attributeFormDefault attribute works in exactly the same way. If the value is "qualified", then the attributes for targetNamespace must be qualified with a prefix. If attributeFormDefault is "unqualified", they do not.

The default value of the fromElementDefault and the attributeElementDefault attributes is "unqualified", so if they're not used then the local attributes and elements of targetNamespace do not need to be qualified. All of the XML documents before Listing 3-14 were unqualified by default, which is why the global elements (address and purchaseOrder) had prefixes but the other elements did not.

If the XML document declares a default namespace, then all elements without prefixes are assigned to that namespace. This rule makes things tricky because unqualified elements are not supposed to be qualified, yet if there is a default namespace, then they are assigned to that namespace and are effectively qualified. As an exercise can you explain why the XML document in Listing 3-18 is valid for the XML schema in Listing 3-15?

Listing 3-15 requires that all elements be qualified. Listing 3-18 declares the default namespace, which is the namespace automatically assigned any element that is not prefixed, so even though the elements in Listing 3-18 are not prefixed, they are qualified and are therefore valid when checked against the XML schema in Listing 3-15.

You are free to configure your schemas any way you want, but I've found that it's generally less confusing if you require that all elements be namespace-qualified by setting elementFormDefault equal to "qualified". That said, this book uses both qualified and unqualified local elements with abandon. You'll see this kind of inconsistency in your real-world development efforts, and it's best if you get used to thinking about local-element qualification early in your work with XML.

Assigning and Locating Schemas

The whole point of schemas is that they define the grammar by which XML documents can be validated. In other words, schemas are used by parsers to verify that an XML document conforms to a specific markup language.

To validate an XML document against one or more schemas, you need to specify which schemas to use. You do so by identifying the schemas' locations, using the schemaLocation attribute, which is an XML schema-instance attribute.

The XML document in Listing 3-18 uses this attribute to declare the location of the one schema it's based on.

Example 3-18. Using schemaLocation with XML documents

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder orderDate="2003-09-22"
   xmlns="http://www.Monson-Haefel.com/jwsbook"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook
       http://www.Monson-Haefel.com/jwsbook/po.xsd">
  <accountName>Amazon.com</accountName>
  <accountNumber>923</accountNumber>
  <shipAddress>
    <name>AMAZON.COM</name>
    <street>1850 Mercer Drive</street>
    <city>Lexington</city>
    <state>KY</state>
    <zip>40511</zip>
  </shipAddress>
  <billAddress>
    <name>Amazon.com</name>
    <street>1516 2nd Ave</street>
    <city>Seattle</city>
    <state>WA</state>
    <zip>90952</zip>
  </billAddress>
  <book>
    <title>J2EE Web Services</title>
    <quantity>300</quantity>
    <wholesale-price>24.99</wholesale-price>
  </book>
  <total>8997.00</total>
</purchaseOrder>

The second namespace declared in Listing 3-18, xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" is the XML schema-instance namespace, which is defined by the XML schema specification. The XML schema specification explicitly defines a few attributes belonging to this namespace, which can be used in XML documents, including the xsi:schemaLocation attribute. Another important attribute from the XML schema-instance namespace is xsi:type, which is addressed in Section 3.2.

The xsi:schemaLocation attribute helps an XML processor locate the actual physical schema document used by the XML instance. Each schema is listed in an xsi:schemaLocation attribute as a namespace-location pair, which associates a namespace with a physical URL. In Listing 3-18, the Monson-Haefel namespace, "http://www.Monson-Haefel.com/jwsbook", is associated with a schema file located at Monson-Haefel Books' Web site. You can use xsi:schemaLocation to point at several schemas if you need to. For example, we can add the schema location for the XML schema-instance, as in Listing 3-19.

Example 3-19. Declaring Multiple Schema Locations

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder orderDate="2003-09-22"
  xmlns="http://www.Monson-Haefel.com/jwsbook"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook
                      http://www.Monson-Haefel.com/jwsbook/po.xsd

                      http://www.w3.org/2001/XMLSchema-instance
                      http://www.w3.org/2001/XMLSchema.xsd">

You use white space to separate the namespace and the location URL in each namespace-location set—and to separate namespace-location pairs from each other. For readability, it's a good idea to use more white space to separate sets than to separate each namespace from its location.

You don't actually need to specify the XML schema-instance schema location,[3] because it must be supported natively by any XML schema validating parser, but you should list any other schemas used in an XML document.

For the schemas identified by xsi:schemaLocation to be useful, they must explicitly define themselves as belonging to one of the namespaces identified in the XML instance document. In this case the schema, Listing 3-12, belongs to the Monson-Haefel Books namespace, "http://www.Monson-Haefel.com/jwsbook", the same namespace specified by the instance document.

A schema can be located on the Internet, as the Monson-Haefel Books schema in Listing 3-18 is, or on a local hard drive. When using a local schema, specify the location relative to the directory in which the XML document is located. For example, Listing 3-20 shows a schema that's in the same local directory as the XML instance.

Example 3-20. Pointing to a Schema on a Local File System

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder orderDate="2003-09-22"
   xmlns="http://www.Monson-Haefel.com/jwsbook"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook
                       po.xsd">
  <accountName>Amazon.com</accountName>
  <accountNumber>923</accountNumber>

It's important to note that the xsi:schemaLocation attribute is considered a “hint” by the XML schema specification, which means that XML parsers are not required to use the schema identified by xsi:schemaLocation, but a good parser will, and some, like Xerces-J, allow you to override the location identified by the xsi:schemaLocation attribute programmatically—useful if you want to avoid downloading the schema every time an XML document based on it is parsed; you can use a cached copy instead of the original.

The xsi:schemaLocation attribute is usually declared in the root element of an XML document, but it doesn't have to be. You can declare it later in the document, as long as it's in the scope of the elements it applies to.

Advanced XML Schema

The key goal of Web services is interoperability, so choosing technologies and standards like XML, SOAP, and WSDL, which are supported by the majority of platforms, is critical. XML is the foundation of Web service interoperability, but even XML can trip you up if you're not careful, particularly the more advanced XML schema types. The painful truth is that XML schema is still new, and some Web service platforms do not support all of its features. That said, according to the WS-I Basic Profile 1.0, Web services must support all of the XML schema features, including those covered in this “Advanced” section.

Inheritance of Complex Types

XML schema supports type inheritance much as object-oriented programming languages do, but XML schema inheritance is actually more comprehensive than in most object-oriented languages. Unfortunately, the richness of XML schema inheritance can cause interoperability headaches.

Many Web service platforms map XML schema types to native primitive types, structures, and objects so that developers can manipulate XML data using constructs native to their programming environment. For example, JAX-RPC maps some of the XML schema built-in types to Java primitives, and basic complex types to Java beans. JAX-RPC can map most derived complex types to Java beans, but not all. Similar limitations are found in other platforms like .NET and SOAP::Lite for Perl. Most object-oriented languages do not support the full scope of inheritance defined by the XML schema specification. For this reason, you should use type inheritance in schemas with care.

Complex types can use two types of inheritance: extension and restriction. Both allow you to derive new complex types from existing complex types. Extension broadens a derived type by adding elements or attributes not present in the base type, while restriction narrows a derived type by omitting or constraining elements and attributes defined by the base type.

Extension

An extension type inherits the elements and attributes of its base type, and adds new ones. For example, we could redefine the USAddress type to be an extension of a base type called Address as shown in Listing 3-21.

Example 3-21. Using XML Schema Inheritance

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schema
  targetNamespace="http://www.Monson-Haefel.com/jwsbook"
  xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
  xmlns="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified">

<element name="address" type="mh:Address"/>

<complexType name="Address">
    <sequence>
      <element name="name" type="string"/>
      <element name="street" type="string" maxOccurs="unbounded"/>
      <element name="city" type="string"/>
      <element name="country" type="string"/>
    </sequence>
    <attribute name="category" type="string" default="business"/>
  </complexType>

  <complexType name="USAddress">
    <complexContent>
      <extension base="mh:Address">
        <sequence>
          <element name="state" type="string"/>
          <element name="zip" type="string"/>
        </sequence>
      </extension>
    </complexContent>
  </complexType>
  ...
</schema>

The complexType and extension elements in Listing 3-21 tell us that USAddress extends Address. It adds the state and zip elements, so that the USAddress type has a total of six elements (name, street, city, state, zip, and country).

The base type Address defined in Listing 3-21 can be used to create other derived types as well. For example, we could extend it to define a United Kingdom address type, UKAddress, as in Listing 3-22.

Example 3-22. A UK Address Type Extends the Address Type in Listing 3-21

<complexType name="UKAddress" >
   <complexContent>
      <extension base="mh:Address">
         <sequence>
            <element name="postcode" type="string"/>
         </sequence>
      </extension>
   </complexContent>
</complexType>

We now have two types derived from the Address type, USAddress and UKAddress, which capture the addressing proper to their respective postal systems.[4]

Restriction

Restriction is very easy to understand. You simply redefine or omit those elements and attributes that change, and list all the other elements and attributes exactly as they were in the base type. For example, we can create a USAddress type that omits the city and state elements, as shown in Listing 3-23. (If you have a zip code you don't need a city and state, because any zip code can be cross-referenced to a specific city and state.)

Example 3-23. An Extension of the USAddress Type Defined in Listing 3-21

<complexType name="BriefUSAddress">
  <complexContent>
    <restriction base="mh:USAddress">
      <sequence>
        <element name="name" type="string"/>
        <element name="street" type="string"/>
        <element name="zip" type="string"/>
      </sequence>
      <attribute name="category" type="string" default="business"/>
    </restriction>
  </complexContent>
</complexType>

In this example, the derived type, BriefUSAddress, contains the name, street, and zip elements, but not the city, state, and country elements, because the schema simply omits them. In addition we have redefined the occurrence constraints on the street element so that it may occur only once (recall that the default values of maxOccurs and minOccurs are both "1"). Compare BriefUSAddress to the Address base type in Listing 3-21, which defined the street element with a maxOccurs equal to "unbounded".

While the above paragraph is correct, there are some important limits on what you can do: You cannot omit an element from a restriction unless the parent type declared it to be optional(minOccurs="0"). In addition, the derived type's occurrence constraints cannot be less strict than those of its base type. For example, you cannot constrain an element to minOccurs="0" and maxOccurs="4" in the child if the parent's element is defined as minOccurs="1" and maxOccurs="2". The restricted occurrence attributes must fall within the boundaries defined by the parent type. For the BriefUSAddress in Listing 3-23 to work, we will need to redefine the USAddress type in Listing 3-21 to make the city and state elements optional (set minOccurs="0"); if we don't, the parser will report an error.

The necessity of repeating all the elements and attributes, even if they don't change, makes restriction a bit cumbersome, but it's the only logical way of indicating which elements and attributes are omitted or constrained.

While restriction is useful, it's used less than extension because it doesn't map as well to programming languages. For this reason, it's risky to use restriction when defining complex types in your XML documents.

Polymorphism and Abstract Base Types

The real power of extension, and of restriction for that matter, is that derived types can be used polymorphically with elements of the base type. In other words, you can use a derived type in an instance document in place of the base type specified in the schema.

For example, suppose we redefine the PurchaseOrder type to use the base Address type for its billAddress and shipAddress elements, instead of the USAddress type, as shown in Listing 3-24.

Example 3-24. Setting Up Polymorphism in a Schema

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schema
    xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
    targetNamespace="http://www.Monson-Haefel.com/jwsbook"
    elementFormDefault="qualified">

  <element name="address" type="mh:Address"/>
  <element name="purchaseOrder" type="mh:PurchaseOrder"/>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:Address"/>
      <element name="billAddress" type="mh:Address"/>
      <element name="book" type="mh:Book"/>
      <element name="total" type="float"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
  ...
</schema>

Because XML schema supports polymorphism, an instance document can now use any type derived from Address for the shipAddress and billAddress elements. For example, in Listing 3-25 the XML instance of PurchaseOrder uses BriefUSAddress for the billAddress element and UKAddress for the shipAddress element.

Example 3-25. Using Polymorphism in an XML Instance

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder orderDate="2003-09-22"
  xmlns="http://www.Monson-Haefel.com/jwsbook"
  xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook
                      http://www.Monson-Haefel.com/jwsbook/po2.xsd">

   <accountName>Amazon.com</accountName>
   <accountNumber>923</accountNumber>
   <shipAddress xsi:type="mh:UKAddress">
       <name>Amazon.co.uk</name>
       <street>Ridgmont Road</street>
       <city>Bedford</city>
       <country>United Kingdom</country>
       <postcode>MK43 0ZA</postcode>
   </shipAddress>
   <billAddress xsi:type="mh:BriefUSAddress">
      <name>Amazon.com</name>
      <street>1516 2nd Ave</street>
      <zip>90952</zip>
   </billAddress>
   <book>
      <title>Java Web Services</title>
      <quantity>300</quantity>
      <wholesale-price>24.99</wholesale-price>
   </book>
   <total>8997.00</total>
</purchaseOrder>

The xsi:type attribute explicitly declares the type of the element in the instance document. Explicitly declaring an element's type with xsi:type tells the parser to validate the element against the derived type instead of the type declared in the schema. You can think of this as “casting” an element, similar to casting a value in Java. The xsi:type must be a type derived from the element's type declared in the schema.

The xsi:type belongs to the XML schema-instance namespace, which is defined by the XML schema specification for use in instance documents. It's the same namespace that's used for the schemaLocation attribute.

Abstract and Final Complex Types

You can declare complex types to be abstract much as you do Java classes. For example, although the Address type is a good base type for USAddress, UKAddress, and BriefUSAddress, it's too vague to be used directly in an instance document. To prevent such use, you can declare the type to be abstract. For example, if we add abstract="true" to the earlier definition of Address, as in the following snippet, it cannot be used directly in an instance document. A member of its substitution group (the types derived from it) must be used instead.

<complexType name="Address" abstract="true">
  <sequence>
    <element name="name" type="string"/>
    <element name="street" type="string" maxOccurs="unbounded"/>
    <element name="city" type="string"/>
    <element name="country" type="string"/>
  </sequence>
  <attribute name="category" type="string" default="business"/>
</complexType>

You can also declare complex types to be final, just as Java classes can be final, to prevent a complex type from being used as a base type for restriction or extension. The possible values for the final attribute are "restriction", "extension", and "#all".

For example, we can declare the USAddress type defined in Listing 3-21 to be “final by extension,” which prevents it from being extended but allows restriction.

<complexType name="USAddress" final="extension">
  <complexContent>
    <extension base="mh:Address">
      <sequence>
        <element name="state" type="string"/>
        <element name="zip" type="string"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

If a type is declared final="restriction", it can be extended but not restricted. If the final attribute equals "#all", the type cannot used as a base type at all.

Inheritance of Simple Types

The built-in simple types are atomic and very restrictive, so they are an excellent foundation for validating data. For example, an unsignedShort type cannot contain letters (only digits), it cannot contain a decimal point, and its value must be between 0 and 65,535. That's pretty restrictive, but what if it's not restrictive enough? XML schema allows us to create new simple types that are derived from existing simple types in order to constrain further the range of possible values that a simple type may represent.

For example, PurchaseOrder declares the total element as an XML schema float type, which means it can contain any decimal value that can be represented with 32 bits of precision. That's a huge range of values, which includes very large negative and positive numbers. For example, both 2,093,020.99 and –24.9941 are valid float values. Monson-Haefel Books wants to limit the value of the total element to a much smaller range: any dollar amount between $0.00 and $100,000.00, a normal range of values used in purchase orders.

To constrain data in the total element to this range, we restrict the built-in float type to create a new type called Total, as shown in Listing 3-26.

Example 3-26. Defining a Simple Type

<simpleType name="Total">
  <restriction base="float">
    <minInclusive value="0"/>
    <maxExclusive value="100000"/>
  </restriction>
</simpleType>

We declare the new Total simple type with the simpleType schema element. The restriction element enables us to limit the range of an existing type, as well as determine its format.

The restriction element for simple types contains one or more facet elements. A facet is an element that represents an aspect or characteristic of the built-in type that can be modified. For example, the Total simple type declares that its minInclusive facet is "0" and its maxExclusive facet is "100000", thereby specifying that values held by elements of this type must be at least zero and less then 100,000.

The XML schema specification defines several facets you may use when restricting a float type. The modifiable facets for float are shown in Table 3-6.[5]

Table 3-6. Float Facets

Float Facet

Meaning

maxInclusive

The inclusive upper bound. The value may not exceed this amount.

maxExclusive

The exclusive upper bound. The value must be less than this amount.

minInclusive

The inclusive lower bound. The value must be at least this amount.

minExclusive

The exclusive lower bound. The value must be greater than this amount.

pattern

The format of the value, defined using a regular expression.

enumeration

The set of allowed values.

You can use the Total type in PurchaseOrder or elsewhere, just as you can a built-in type. Listing 3-27 shows the PurchaseOrder type using the new Total simple type.

Example 3-27. Using Derived Simple Types in a Schema

<schema
  xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
  targetNamespace="http://www.Monson-Haefel.com/jwsbook"
  elementFormDefault="qualified" >
  ...
  <simpleType name="Total">
    <restriction base="float">
      <minInclusive value="0.00"/>
      <maxExclusive value="100000.00"/>
    </restriction>
  </simpleType>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:Address"/>
      <element name="billAddress" type="mh:Address"/>
      <element name="book" type="mh:Book"/>
      <element name="total" type="mh:Total"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
  ...

</schema>

There are many kinds of facets, and each built-in type is assigned a subset of facets, which can be used to create new simple types. A complete list of facets for each data type can be found in XML Schema Part 2: Data Types.[6]

The pattern Facet

Most built-in types support the pattern facet, which is very powerful. While other facets are pretty self-explanatory, the pattern facet will look strange if you've never worked with regular expressions before. In XML schema, a regular expression is used to verify that the contents of an element or attribute adhere to a predefined character pattern.

For example, in addition to restricting the range of the Total type defined in Listing 3-27 to values between 0 and 100,000, we can declare a pattern facet to limit fractional amounts to two digits after the decimal point, as is conventional for dollar amounts.

<simpleType name="Total">
   <restriction base="float">
      <pattern value="[0-9]+.[0-9]{2}" />
      <minInclusive value="0"/>
      <maxExclusive value="100000" />
   </restriction>
</simpleType>

The regular expression "[0-9]+.[0-9]{2}" specifies that there must be at least one digit before the decimal point and exactly two digits following the decimal point. The following table shows valid and invalid values for the Total type.

Valid Values

Invalid Values

0.00

.00

0.10

0.1

1.01

–1.00

99,999.99

100,001.00

The pattern facet is commonly applied to string types. For example, we can define a USZipCode type that restricts a string value either to five digits, or to nine digits with the last four set off by a hyphen. Listing 3-28 illustrates.

Example 3-28. Using the pattern Facet

<simpleType name="USZipCode">
   <restriction base="string">
      <pattern value="[0-9]{5}(-[0-9]{4})?" />
   </restriction>
</simpleType>

We could modify Listing 3-21 as in the following snippet to use the USZipCode simple type for the USAddress and BriefUSAddress types, to provide stronger validation of U.S. addresses.

<complexType name="USAddress" final="extension">
  <complexContent>
    <extension base="mh:Address">
      <sequence>
        <element name="state" type="string"/>
        <element name="zip" type="mh:USZipCode"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>
...
<complexType name="BriefUSAddress">
  <complexContent>
    <restriction base="mh:USAddress">
      <sequence>
        <element name="name" type="string"/>
        <element name="street" type="string"/>
        <element name="zip" type="mh:USZipCode"/>
      </sequence>
      <attribute name="category" type="string"
                 default="business"/>
    </restriction>
  </complexContent>
</complexType>

Appendix B provides an overview of schema regular expressions. Readers already familiar with regular expressions may also find this appendix valuable because XML schema's regular-expression syntax has some small but important differences from that of other languages or tools (Perl, for example).

The enumeration Facet

The enumeration facet restricts the value of any simple type (except boolean) to a set of distinct values. For example, we can create a new USState type, which restricts the value of a string type to two-letter state abbreviations as shown in Listing 3-29.[7]

Example 3-29. Defining an Enumeration

<simpleType name="USState">
   <restriction base="string">
      <enumeration value="AK"/> <!-- Alaska   -->
      <enumeration value="AL"/> <!-- Alabama  -->
      <enumeration value="AR"/> <!-- Arkansas -->
      <!-- and so on -->
   </restriction>
</simpleType>

We can then modify Listing 3-21 to use the USState enumeration type in the state element of the USAddress type, in order to constrain its value to valid U.S. state abbreviations.

<complexType name="USAddress" final="extension">
  <complexContent>
    <extension base="mh:Address">
      <sequence>
        <element name="state" type="mh:USState"/>
        <element name="zip" type="mh:USZipCode"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

List and Union Types

The simple types we have examined thus far are all atomic, which means that each one represents a single piece of data. For example, although the name element of Address may contain spaces (e.g., <name>Richard W. Monson-Haefel</name>), the string value is still considered one piece of data. List and union types, however, allow us to define elements or attributes that contain multiple pieces of data separated by spaces.

While list types are supported by many Web service platforms, union types are not. Union and list types should be used with care, especially when interoperability across programming environments is important.

List Types

A list is a sequence of simple-type values separated by white space. For example, you can define a USStateList type to contain several USState type values, as shown in Listing 3-30.

Example 3-30. Defining a List Type

<simpleType name="USStateList">
    <list itemType="mh:USState"/>
</simpleType>

In an instance document, an element of the USStateList type could contain zero or more state abbreviations separated by spaces.

<list-of-states>CA  NY  FL  AR  NH</list-of-states>

A list type may have length, minLength, maxLength, and enumeration facets. The length facets control the number of tokens contained by the element or attribute, while the enumeration facet defines a strict set of valid values.

A list type can be based on any simple type, built-in or derived, but not on other list types or on complex types. XML schema defines a built-in list type called NMTOKENS. NMTOKENS is a list of the NMTOKEN simple type, which is a string without spaces, line feeds, or tabs (see Table 3-2). NMTOKENS can be used only with attributes.

List types should be based on simple types that do not have spaces because the parser assumes that spaces separate values in the list. NMTOKENS is recommended for lists of attributes. For elements, a list type based on the token type (see Table 3-2) or simple types with no spaces, such as USState and USZipCode, is strongly recommended.

Union Types

A union is a set of valid simple types. It's a lot like a list type, except it can accommodate more than one kind of simple type. For example, the union type USStateOrZipUnion allows the value to be either a USStateList type or a USZipCode type, as shown in Listing 3-31.

Example 3-31. Defining a Union Type

<simpleType name="USStateOrZipUnion">
   <union memberTypes="mh:USStateList mh:USZipCode"/>
</simpleType>

An element or attribute based on this type can hold either a USStateList or a USZipCode. It cannot, however, contain a mix of values. In other words, a USStateOrZipUnion can contain a list of state codes or a single zip code, but not a mix of states and zip codes or more than one zip code. In the following example, valid and invalid values are shown for the hypothetical location element of type USStateOrZipUnion.

<!-- valid use of union type -->
<location>CA NJ AK</location>
<location>94108</location>

<!-- invalid use of union type -->
<location>94108 CA 554011 MN</location>

Anonymous Types

You can combine an element declaration with a complex or simple type declaration to create an anonymous type. An anonymous type is not named and cannot be referred to outside the element that declares it. For example, throughout this chapter the Purchase Order schema has defined a PurchaseOrder type and a purchaseOrder element separately, as shown in the following snippet from Listing 3-13.

<element name="purchaseOrder" type="mh:PurchaseOrder"/>

<complexType name="PurchaseOrder">
  <sequence>
    <element name="accountName" type="string"/>
    <element name="accountNumber" type="unsignedShort"/>
    <element name="shipAddress" type="mh:Address"/>
    <element name="billAddress" type="mh:Address"/>
    <element name="book" type="mh:Book"/>
    <element name="total" type="mh:Total"/>
  </sequence>
  <attribute name="orderDate" type="date"/>
</complexType>

The PurchaseOrder type is not very useful outside the purchaseOrder element, so we can combine the two declarations into one as in Listing 3-32.

Example 3-32. Defining an Anonymous Type

<element name="purchaseOrder">
  <complexType>
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:Address"/>
      <element name="billAddress" type="mh:Address"/>
      <element name="book" type="mh:Book"/>
      <element name="total" type="mh:Total"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
</element>

We've combined definition of the PurchaseOrder type with declaration of the purhaseOrder element. Notice that the element declaration doesn't need a type attribute because it defines its own type, and that the complexType declaration doesn't declare a name attribute; it's anonymous.

Anonymous types can simplify schemas, but they can also be abused if nested too deeply or applied indiscriminately. A balanced approach is better, using a combination of anonymous types and named types. For example, the purchaseOrder anonymous type can contain other anonymous types as well as named types. In Listing 3-33 the book and total elements are nested anonymous types, while USAddress remains a named type that is defined elsewhere.

Example 3-33. Nesting Anonymous Types

<element name="purchaseOrder">
  <complexType>
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="mh:Address"/>
      <element name="billAddress" type="mh:Address"/>
      <element name="book">
        <complexType>
          <sequence>
            <element name="title" type="string"/>
            <element name="quantity" type="unsignedShort"/>
            <element name="wholesale-price" type="float"/>
          </sequence>
        </complexType>
      </element>
      <element name="total">
        <simpleType>
          <restriction base='float'>
            <minInclusive value="0"/>
            <maxExclusive value="100000"/>
            <pattern value="[0-9]+.[0-9]{2}"/>
          </restriction>
        </simpleType>
      </element>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
</element>

Anonymous types can be based on complex or simple types. In this example, the total element is defined with an anonymous simple type, using simple type inheritance.

Because anonymous types have no names, they cannot be referred to outside the element that defines them. Anonymous types are not reusable, and you should employ them only when you know that the type won't be useful in other schemas. For example, the book and total elements are based on anonymous types that might well be useful in other circumstances; you might benefit from defining them separately as named types. In the end it's a judgment call.

Importing and Including Schemas

You can combine schemas using two different elements, include and import. An import allows you to combine schemas from different namespaces, while an include lets you combine schemas from the same namespace.

Importing

A schema may import types from other schemas, allowing more modular schema design and type reuse. For example, we can define a separate schema and namespace for all the types related to mailing addresses: Address, USAddress, UKAddress, BriefUSAddress, USZipCode, and USState. This schema would define the complete Address Markup Language for Monson-Haefel Books. Listing 3-34 shows an abridged version of this schema.

Example 3-34. The Address Markup Schema

<?xml version="1.0" encoding="UTF-8" ?>
<schema
   targetNamespace="http://www.Monson-Haefel.com/addr"
   xmlns:addr="http://www.Monson-Haefel.com/addr"
   xmlns="http://www.w3.org/2001/XMLSchema">

  <element name="address" type="addr:Address"/>

  <simpleType name="USZipCode">
    <restriction base="string">
      <pattern value="[0-9]{5}(-[0-9]{4})?"/>
    </restriction>
  </simpleType>

  <simpleType name="USState">
    <restriction base="string">
      <enumeration value="AK"/> <!-- Alaska   -->
      <enumeration value="AL"/> <!-- Alabama  -->
      <enumeration value="AR"/> <!-- Arkansas -->
      <!-- and so on -->
    </restriction>
  </simpleType>

  <complexType name="Address" abstract="true">
    <sequence>
      <element name="name" type="string"/>
      <element name="street" type="string" maxOccurs="unbounded"/>
      <element name="city" type="string"/>
      <element name="country" type="string"/>
    </sequence>
    <attribute name="category" type="string" default="business"/>
  </complexType>

  <complexType name="USAddress" final="extension">
    <complexContent>
      <extension base="addr:Address">
        <sequence>
          <element name="state" type="addr:USState"/>
          <element name="zip" type="addr:USZipCode"/>
        </sequence>
      </extension>
    </complexContent>
  </complexType>

  <complexType name="UKAddress">
    <complexContent>
      <extension base="addr:Address">
        <sequence>
          <element name="postcode" type="string"/>
        </sequence>
      </extension>
    </complexContent>
  </complexType>

  <complexType name="BriefUSAddress">
    <complexContent>
      <restriction base="addr:USAddress">
        <sequence>
          <element name="name" type="string"/>
          <element name="street" type="string"/>
          <element name="zip" type="addr:USZipCode"/>
        </sequence>
        <attribute name="category" type="string" default="business"/>
      </restriction>
    </complexContent>
  </complexType>

</schema>

The targetNamespace of the Address Markup schema is "http://www.Monson-Haefel.com/jwsbook/ADDR", which is a separate namespace from that of the purchase-order elements. Because the PurchaseOrder type depends on the Address type, we'll need to import the Address Markup schema into the Purchase Order schema as in Listing 3-35.

Example 3-35. Importing a Schema

<?xml version="1.0" encoding="UTF-8" ?>
<schema
  targetNamespace="http://www.Monson-Haefel.com/jwsbook/PO"
  xmlns:po="http://www.Monson-Haefel.com/jwsbook/PO"
  xmlns:addr="http://www.Monson-Haefel.com/jwsbook/ADDR"
  xmlns="http://www.w3.org/2001/XMLSchema">

 <import namespace="http://www.Monson-Haefel.com/jwsbook/ADDR"
  schemaLocation="http://www.Monson-Haefel.com/jwsbook/addr.xsd" />
  <element name="purchaseOrder" type="po:PurchaseOrder"/>
  <simpleType name="Total">
    <restriction base="float">
      <minInclusive value="0.00"/>
      <maxExclusive value="100000.00"/>
      <pattern value="[0-9]+.[0-9]{2}"/>
    </restriction>
  </simpleType>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="addr:Address"/>
      <element name="billAddress" type="addr:Address"/>
      <element name="book" type="po:Book"/>
      <element name="total" type="po:Total"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
  <complexType name="Book">
    <sequence>
      <element name="title" type="string"/>
      <element name="quantity" type="unsignedShort"/>
      <element name="wholesale-price" type="float"/>
    </sequence>
  </complexType>
</schema>

The import mechanism enables you to combine schemas to create larger, more complex schemas. It's very useful when you see that some aspects of a schema, such as the address types, are reusable and need their own namespace and schema. The imported namespace needs to be assigned a prefix before we can use it. In this case, it's assigned the prefix addr in the root schema element.

Including

In addition to the import element, there is another way of combining schemas called include, which can be used only to combine schemas with exactly the same targetNamespace. Including is useful when a schema becomes large and difficult to maintain. The Purchase Order schema has not become that unwieldy, but just as an example, we could place the definitions of the Total and Book types into a separate schema, then use an include element to combine them with the Purchase Order schema. Listing 3-36 shows a schema document for the Total and Book elements, which we'll soon include in the Purchase Order schema.

Example 3-36. The Book and Total Schema

<?xml version="1.0" encoding="UTF-8" ?>
<schema
  targetNamespace="http://www.Monson-Haefel.com/jwsbook/PO"
  xmlns:po="http://www.Monson-Haefel.com/jwsbook/PO"
  xmlns="http://www.w3.org/2001/XMLSchema">

  <simpleType name="Total">
    <restriction base="float">
      <minInclusive value="0.00"/>
      <maxExclusive value="100000.00"/>
      <pattern value="[0-9]+.[0-9]{2}"/>
    </restriction>
  </simpleType>

  <complexType name="Book">
    <sequence>
      <element name="title" type="string"/>
      <element name="quantity" type="unsignedShort"/>
      <element name="wholesale-price" type="float"/>
    </sequence>
  </complexType>
</schema>

Here the Book and Total types have been placed in their own schema document—but notice that the targetNamespace is the same as in the Purchase Order schema in Listing 3-35. We can combine these two schemas using an include statement. Listing 3-37 shows the use of both import and include.

Example 3-37. Using Import and Include Together

<?xml version="1.0" encoding="UTF-8" ?>
<schema
  targetNamespace="http://www.Monson-Haefel.com/jwsbook/PO"
  xmlns:po="http://www.Monson-Haefel.com/jwsbook/PO"
  xmlns:addr="http://www.Monson-Haefel.com/jwsbook/ADDR"
  xmlns="http://www.w3.org/2001/XMLSchema">

 <include
  schemaLocation="http://www.Monson-Haefel.com/jwsbook/po.xsd" />

 <import namespace="http://www.Monson-Haefel.com/jwsbook/ADDR"
  schemaLocation="http://www.Monson-Haefel.com/jwsbook/addr.xsd" />

  <element name="purchaseOrder" type="po:PurchaseOrder"/>
  <complexType name="PurchaseOrder">
    <sequence>
      <element name="accountName" type="string"/>
      <element name="accountNumber" type="unsignedShort"/>
      <element name="shipAddress" type="addr:Address"/>
      <element name="billAddress" type="addr:Address"/>
      <element name="book" type="po:Book"/>
      <element name="total" type="po:Total"/>
    </sequence>
    <attribute name="orderDate" type="date"/>
  </complexType>
</schema>

Notice that we don't specify the namespace of the included schema, because it's expected to match the targetNamespace of the schema, doing the including.

Wrapping Up

XML schema provides a standard typing system for defining markup languages and validating XML documents. SOAP, WSDL, and UDDI data structures are all defined in XML schema, so a good understanding of this technology is essential. There is a lot more to XML schema than this chapter covers; it would require an entire book to do the topic justice, but with this primer under your belt you are prepared to investigate new concepts by reading the W3C recommendation entitled “XML Schema” directly.

The W3C's XML schema recommendation is the last word on the topic, but it's not always an easy read. It's divided into three parts. The Primer, Part 0, is usually the best place to start when you need to learn about new features. It's a non-normative overview with examples. Part 1 covers the structure of schemas, and Part 2 defines concisely the XML schema data types. You can find these three documents at

http://www.w3.org/TR/xmlschema-0/

http://www.w3.org/TR/xmlschema-1/

http://www.w3.org/TR/xmlschema-2/

Although XML schema is the basis of Web services in J2EE, it's not the only XML schema language available today. In fact there are a couple of other schema languages, including DTDs (see Appendix A), Schematron, RELAX-NG, and a few others. Of these, Schematron appears to be the best complement to XML schema, or at least to offer validation checks that XML schema cannot duplicate.

Schematron is based on Xpath and XSLT and is used for defining context-dependent rules for validating XML documents. For example, in the purchase-order document you could use Schematron to ensure that the value of the total element equals the value of the quantity element multiplied by the value of the wholesale-price element, as shown in Listing 3-38.

Example 3-38. PurchaseOrder Instance Document

<?xml version="1.0" encoding="UTF-8"?>
<purchaseOrder orderDate="2003-09-22"
     xmlns:mh="http://www.Monson-Haefel.com/jwsbook">
                   ...
                   <book>
                        <title>J2EE Web Services</title>
                        <quantity>300</quantity>
                        <wholesale-price>24.99</wholesale-price>
                   </book>
 <total>7485.00</total>
</purchaseOrder>

XML schema does not provide this type of business-rule support, so you may well want to use Schematron in combination with XML schema to provide more robust validation. You can find out more about Schematron at Rick Jelliffe's Web site, http://www.ascc.net/xml/schematron/.



[1] The Universal Character Set (ISO/IEC 10646-1993) is a superset of all other character codes, including UTF-8 and UTF-16.

[2] World Wide Web Consortium, “XML Schema Part 2: Datatypes”, W3C Recommendation, May 2, 2001. Available at http://www.w3.org/TR/xmlschema-2/.

[3] Whether you should is open to interpretation. For example, declaring the location of the XML Schema-Instance works with the Apache Xerces-J's SAX parser but not with Altova's XMLSpy (version 5, release 3).

[4] Actually, many UK addresses may not have a city, but we will ignore that detail in this example.

[5] Missing from this table is the whiteSpace facet, not shown because its value cannot be modified for a float type, which must always be "constrain".

[6] World Wide Web Consortium, “XML Schema Part 2: Datatypes”, W3C Recommendation, May 2, 2001. Available at http://www.w3.org/TR/xmlschema-2/.

[7] The USState type would also include the District of Columbia (D.C.), commonwealths (e.g., Puerto Rico), territories (e.g., Virgin Islands), as well as special codes for the U.S. armed services abroad (e.g., Armed Forces Europe).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.160.221