SOAP, WSDL, and UDDI are markup languages defined using the W3C XML Schema Language, so understanding the latter is critical to understanding J2EE Web Services. This chapter will provide you with a good understanding of both W3C XML Schema Language basics and, optionally, advanced concepts, so that you are ready to learn about SOAP, WSDL, and the UDDI standards covered later.
Throughout this chapter the term XML schema will be used to refer to the W3C XML Schema Language as a technology, while the word schema by itself will refer to a specific XML schema document.
The XML specification includes the Document Type Definition (DTD), which can be used to describe XML markup languages and to validate instances of them (XML documents). While DTDs have proven very useful over the years, they are also limited. To address limitations of DTDs, the W3C (World Wide Web Consortium), which manages the fundamental XML standards, created a new way to describe markup languages called XML schema.
DTDs have done an adequate job of telling us how elements and attributes are organized in a markup language, but they fail to address data typing.
For example, the DTD in Listing 3-1 describes the valid organization of the Address Markup Language we created earlier. The DTD declares that an address
element may contain one or more street
elements and must contain exactly one of each of the city
, state
, and zip
elements. It also declares that the address
element must have a category
attribute.
Example 3-1. A DTD
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT address (street+, city, state, zip)> <!ELEMENT street (#PCDATA) > <!ELEMENT city (#PCDATA) > <!ELEMENT state (#PCDATA) > <!ELEMENT zip (#PCDATA) > <!ATTLIST address category CDATA #REQUIRED >
A parser reading an XML instance determines whether it's valid by comparing it to its DTD—if it declares that it uses a DTD. To be valid, an XML instance must conform to its DTD, which means it must use the elements specified by the DTD in the correct order and multiplicity (zero, one, or many times).
While constraints provided by DTDs are useful for validating XML instances, the probability that an XML instance will have a valid organization but contain invalid data is pretty high. DTDs have a very weak typing system that restricts elements to four broad types of data: EMPTY, ANY, element content, or mixed element-and-text content. In other words, DTDs can only restrict elements to containing nothing, other elements, or text—not a very granular typing system. DTDs don't support types like integer
, decimal
, boolean
, and enumeration
. For example, the Address Markup DTD cannot restrict the contents of the zip
element to an integer value or the state
element to a set of valid state codes.
XML schema, by contrast, provides a much stronger type system. Many believe that XML schema is superior to DTD because it defines a richer type system, which includes simple primitives (integer
, double
, boolean
, among others) as well as facilities for more complex types. XML schema facilitates type inheritance, which allows simple or complex types to be extended or restricted to create new types. In addition, XML schema supports the use of XML namespaces to create compound documents composed of multiple markup languages.
Appendix A explains XML DTDs, but understanding the DTD schema language is not necessary for this book.
A schema describes an XML markup language. Specifically it defines which elements and attributes are used in a markup language, how they are ordered and nested, and what their data types are.
A schema describes the structure of an XML document in terms of complex types and simple types. Complex types describe how elements are organized and nested. Simple types are the primitive data types contained by elements and attributes. For example, Listing 3-2 shows a portion of a schema that describes the Monson-Haefel Markup Language. Monson-Haefel Markup defines a set of XML schema types used by Monson-Haefel Books: USAddress
, PurchaseOrder
, Invoice
, Shipping
, and the like. At this point all the different types used by Monson-Haefel Books are combined into one schema; later you'll learn how to separate them into their own schemas and independent markup languages.
Example 3-2. The Address Definition in a Schema
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com/jwsbook"> <element name="address" type="mh:USAddress" /> <complexType name="USAddress"> <sequence> <element name="name" type="string" /> <element name="street" type="string" /> <element name="city" type="string" /> <element name="state" type="string" /> <element name="zip" type="string" /> </sequence> </complexType> ... </schema>
The first thing you may have noticed is that Listing 3-2 is actually an XML document. That schemas are XML documents is a critical point: It makes the development of validating parsers and other software tools easier, because the operations that manipulate schemas can be based on XML parsers, which are already widely available. DTDs, the predecessor to schemas, were not based on XML, so processing them required special parsing.
The root element of a schema document is always the schema
element. Nested within the schema
element are element and type declarations. Listing 3-2 declares a complex type named USAddress
, and an element of that type named address
.
The schema
element assigns the XML schema namespace ("http://www.w3.org/2001/XMLSchema"
) as the default namespace. This namespace is the standard namespace defined by the XML schema specification—all the XML schema elements must belong to this namespace. The schema
element also defines the targetNamespace
attribute, which declares the XML namespace of all new types explicitly created within the schema. For example, the USAddress
type is automatically assigned to targetNamespace, "http://www.Monson-Haefel.com/jwsbook"
.
The schema element also uses an XML namespace declaration to assign the prefix mh
to the targetNamespace
. Subsequently, newly created types in the schema can be referred to as "mh:
Typename
"
. For example, the type
attribute in the element declaration in Listing 3-2 refers to the USAddress
as "mh:USAddress":
<element name="address" type="mh:USAddress" />
An instance document based on this schema would use the address
element directly or refer to the USAddress
type. When a parser that supports XML schema reads the document, it can validate the contents of the XML document against the USAddress
type definition in Listing 3-2. Listing 3-3 shows a conforming XML instance.
Example 3-3. An Instance of the Address Markup Language
<?xml version="1.0" encoding="UTF-8"?> <addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook"> <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </addr:address>
Using XML schema, we can state exactly how an instance of the address
element should be organized and the types of data its elements and attributes should contain.
A simple type resembles a Java primitive type in that both are atomic; they cannot be broken down into constituent parts. In other words, a simple element type will not contain other elements; it will contain only data. The XML schema specification defines many standard simple types, called built-in types. The built-in types are the standard building blocks of an XML schema document. They are members of the XML schema namespace, "http://www.w3.org/2001/XMLSchema"
.
Table 3-1. Comparing the Use of XML Schema Simple Types and Java Primitive Types
XML Schema Built-in Simple Types (shown in bold) | Java Primitive Types (shown in bold) |
---|---|
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com /jwsbook"> ... <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string" /> <element name="accountNumber" type="integer" /> <element name="total" type="float" /> <!-- More stuff follows --> </sequence> </complexType> ... </schema> |
package com.monsonhaefel.jwsbook; public class PurchaseOrder { String accountName; int accountNumber; float total; // more stuff follows } |
The PurchaseOrder
complex type declares three of its elements and an attribute using the XML schema built-in types: string
, integer
, and float
. These simple types are similar to familiar types in the Java programming language and others. In a schema, simple types are used to construct complex types, much as Java primitives are used as fields of Java class definitions. Table 3-1 provides a comparison. The next section explains complex types in more detail.
The XML schema specification describes its 44 built-in simple types in precise detail. This precision enables XML parsers to process the built-in types predictably and consistently, for the most part, and provides a solid foundation for creating your own complex and custom simple types.
For example, the XML schema specification tells us that a string
is defined as an unlimited length of characters based on the Universal Character Set;[1] an unsignedShort
is a non-decimal number between 0 and 65,535; a float
is a 32-bit floating-point type; and a date
is represented as YYYY-MM-DD.
You can find complete and concise definitions of all the built-in types in XML Schema Part 2: Datatypes.[2] Table 3-2 provides a partial list, with brief definitions in plain English.
Table 3-2. A Subset of the XML Schema Built-in Simple Types
Simple Type | Definition |
---|---|
| A sequence of characters conforming to UCS |
| A string without carriage returns, line feeds, or tabs |
| A string without spaces, line feeds, or tabs |
| A token used in attributes |
| A non-decimal number between –128 and 127 |
| A non-decimal number between 0 and 255 |
| Base64-encoded binary data (RFC 2045)[a] |
| Hex-encoded binary data[b] |
| A base-10-integer number of any size (…)[c] |
| A base-10 integer greater then zero (1, 2, …) |
| A base-10 integer less then zero (…, –2, –1) |
| A base-10 integer between –2,147,483,648 and 2,147,483,647 (–2 billion and 2 billion) |
| A base-10 integer between 0 and 4,294,967,295 (zero and 4 billion) |
| A base-10 integer between –9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 (–9 quintillion and 9 quintillion) |
| A base-10 integer between 0 and 18,446,744,073,709,551,615 (zero and 18 quintillion) |
| A base-10 integer between –32,767 and 32,767 |
| A base-10 integer between 0 and 65,535 |
| A decimal number of any precision and size |
| A decimal number conforming to the IEEE single-precision 32-bit floating-point type[d] |
| A decimal number conforming to the IEEE double-precision 64-bit floating-point type[d] |
| A boolean value of |
You can also use the values of | |
| A time in hours, minutes, seconds, and milliseconds formatted as hh:mm:ss.sss (e.g., 1:20 PM is 13:20:00) |
You may include the optional Coordinated Universal Time (UTC) designator (e.g., 1:20 PM Eastern Standard Time (EST) is 13:20:00-05:00)[e] | |
| A Gregorian date in centuries, years, months, and days (e.g., December 31, 2004 is 2004-12-31)[e] |
| A Gregorian date measured in centuries, years, months, and days, with a time field set off by a T (e.g., 1:20 PM EST on December 31, 2004 would be 2004-12-31T13:20:00-05:00)[e] |
| A span of time measured in years, months, days, and seconds (e.g., 1 year, 2 months, 3 days, 10 hours, and 30 minutes would be P1Y2M3DT10H30M) |
Duration may be negative, and zero values can be left off (e.g., 120 days earlier is P120D). The value must always start with the letter P.[f] | |
[a] N. Freed and N. Borenstein, “RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies” (1996). Available at http://www.ietf.org/rfc/rfc2045.txt. [b] A very good explanation of the hexadecimal numbering system can be found at http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/ch01/CH01-2.html#HEADING2-1. [c] Computers can't actually support infinite numbers, so the XML schema specification requires that the parser must support at least 18 digits, which is a pretty huge number. [d] Institute of Electrical and Electronics Engineers, “IEEE Standard for Binary Floating-Point Arithmetic”. See http://standards.ieee.org/reading/ieee/std_public/description/busarch/754-1985_desc.html. [e] International Organization for Standardization (ISO). “Representations of dates and times” (1988). [f] The duration type is defined in the XML schema specification and is not based on ISO's “Representations of dates and times”. |
All built-in simple and complex types are ultimately derived from anyType
, which is the ultimate base type, like the Object
class in Java. The XML Schema Part 2: Datatypes specification offers a diagram of the data type hierarchy; see Figure 3-1 on the next page.
A schema may declare complex types, which define how elements that contain other elements are organized. The USAddress
schema type in Listing 3-2, for example, is a complex type definition for a United States postal address. It tells us that an element based on this type will contain five other elements called name
, street
, city
, state
, and zip
.
A complex type is analogous to a Java class definition with fields but no methods. The fields in a Java class declare the names and types of variables that an instance of that class will contain. Similarly, a complex type declares the names and types of elements and attributes that an XML instance of that type may contain. An instance of a complex type is an element in an XML document. Table 3-3 compares an XML schema type and a Java class definition for a U.S. address.
Table 3-3. Comparing XML Schema Complex Types to Java Class Definitions
XML Schema: Complex Type | Java Class Definition |
---|---|
<complexType name="USAddress"> <sequence> <element name="name" type="string" /> <element name="street" type="string" /> <element name="city" type="string" /> <element name="state" type="string" /> <element name="zip" type="string" /> </sequence> </complexType> |
public class USAddress { public String name; public String street; public String city; public String state; public String zip; } |
While this analogy between XML schema complex types and Java class definitions is helpful, take care not to confuse them. A schema is used to define elements and attributes in a markup language and verify the correctness of an XML instance; it's not a computer program.
Most complexType
declarations in schemas will contain a sequence
element that lists one or more element
definitions. The element
definitions tell you which elements are nested in the type, the order in which they appear, and the kind of data each element contains.
The USAddress
type clearly defines the proper structure of a U.S. postal address and can be used to verify the proper contents of any element based on that type. For example, the address
element used throughout Chapter 2 could be an instance of the type USAddress
, and we could use that type to verify the contents of the address element when it was used in an XML instance. Table 3-4 shows the USAddress
type alongside the address
element so you can see how a complex type definition maps to an XML instance.
A complex type may contain a sequence of elements that are simple types or other complex types. For example, we can define an element for a purchase-order document by adding a PurchaseOrder
type to the Monson-Haefel Markup Language you saw in Listing 3-2. In Listing 3-4, the new PurchaseOrder
type has two nested elements, billAddress
and shipAddress
, both of type USAddress
.
Example 3-4. The PurchaseOrder
Type in a Schema
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com/jwsbook" > <element name="purchaseOrder" type="mh:PurchaseOrder" /> <element name="address" type="mh:USAddress" /> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string" /> <element name="accountNumber" type="unsignedShort" /> <element name="shipAddress" type="mh:USAddress" /> <element name="billAddress" type="mh:USAddress" /> <element name="book" type="mh:Book" /> <element name="total" type="float" /> </sequence> </complexType> <complexType name="USAddress"> <sequence> <element name="name" type="string" /> <element name="street" type="string" /> <element name="city" type="string" /> <element name="state" type="string" /> <element name="zip" type="string" /> </sequence> </complexType>
Table 3-4. Mapping a Schema Complex Type to an XML Element
XML Schema: | XML Document: |
---|---|
<complexType name="USAddress"> <sequence> <element name="name" type="string" /> <element name="street" type="string" /> <element name="city" type="string" /> <element name="state" type="string" /> <element name="zip" type="string" /> </sequence> </complexType> |
<address> <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address> |
<complexType name="Book"> <sequence> <element name="title" type="string" /> <element name="quantity" type="unsignedShort" /> <element name="wholesale-price" type="float" /> </sequence> </complexType> </schema>
The schema makes use of both complex types (PurchaseOrder
, USAddress
, and Book
) and simple types (string
, unsignedShort
, and float
).
The USAddress
type is a member of the targetNamespace
, so we refer to it by its fully qualified name, "mh:USAddress"
. (Recall that targetNamespace
is assigned the namespace prefix mh
in the schema
element.)
As you can see, the PurchaseOrder
type takes full advantage of USAddress
by using it to define both its billAddress
and shipAddress
elements. In this way, complex type declarations can build on other complex type definitions to create rich types that easily describe very complex XML structures. The PurchaseOrder
type also uses Book
, another complex type that describes the book being ordered.
The names of XML schema types are case-sensitive. When an element declares that it is of a particular type, it must specify both the namespace and the name of that type exactly as the type declares them.
In addition to sequences of elements, a complex type may also define its own attributes. For example, Listing 3-5 shows a new version of the PurchaseOrder
type that includes the definition of an orderDate
attribute.
Example 3-5. Adding an Attribute to a Complex Type
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schema
xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
targetNamespace="http://www.Monson-Haefel.com/jwsbook">
<element name="purchaseOrder" type="mh:PurchaseOrder"/>
<complexType name="PurchaseOrder">
<sequence>
<element name="accountName" type="string"/>
<element name="accountNumber" type="unsignedShort"/>
<element name="shipAddress" type="mh:USAddress"/>
<element name="billAddress" type="mh:USAddress"/>
<element name="book" type="mh:Book"/>
<element name="total" type="float"/>
</sequence>
<attribute name="orderDate" type="date"/>
</complexType>
<complexType name="USAddress">
<sequence>
<element name="name" type="string"/>
<element name="street" type="string"/>
<element name="city" type="string"/>
<element name="state" type="string"/>
<element name="zip" type="string"/>
</sequence>
</complexType>
<complexType name="Book">
<sequence>
<element name="title" type="string"/>
<element name="quantity" type="unsignedShort"/>
<element name="wholesale-price" type="float"/>
</sequence>
</complexType>
</schema>
The next code sample, Listing 3-6, shows a valid XML document based on the PurchaseOrder
type defined by the schema you saw in Listing 3-5. The XML document in Listing 3-6 would contain all the elements and the orderDate
attribute as described by the PurchaseOrder
complex type—and would be verifiable against that type.
Example 3-6. An Instance of the Schema in Listing 3-5
<?xml version="1.0" encoding="UTF-8"?>
<po:purchaseOrder orderDate="2003-09-22"
xmlns:po="http://www.Monson-Haefel.com/jwsbook">
<accountName>Amazon.com</accountName>
<accountNumber>923</accountNumber>
<shipAddress>
<name>AMAZON.COM</name>
<street>1850 Mercer Drive</street>
<city>Lexington</city>
<state>KY</state>
<zip>40511</zip>
</shipAddress>
<billAddress>
<name>Amazon.com</name>
<street>1516 2nd Ave</street>
<city>Seattle</city>
<state>WA</state>
<zip>90952</zip>
</billAddress>
<book>
<title>J2EE Web Services</title>
<quantity>300</quantity>
<wholesale-price>24.99</wholesale-price>
</book>
<total>8997.00</total>
</po:purchaseOrder>
The multiplicity of an element, the number of times it occurs in an instance document, is controlled by occurrence constraints, which are declared by the maxOccurs
and minOccurs
attributes. For example, we can enhance the USAddress
complex type by placing occurrence constraints on the street
element as shown in Listing 3-7.
Example 3-7. Using Occurrence Constraints
<complexType name="USAddress"> <sequence> <element name="name" type="string" /> <element name="street" type="string" minOccurs="1" maxOccurs="2" /> <element name="city" type="string" /> <element name="state" type="string" /> <element name="zip" type="string" /> </sequence> </complexType>
The occurrence constraints specify that in any instance of USAddress
the street
element must be present at least once and at most twice. In other words, a USAddress
can contain either one or two street
elements. The default value for both maxOccurs
and minOccurs
is "1"
, so if these attributes are not specified the element must be present exactly once. Thus, by default, each USAddress
must have exactly one name
, city
, state
, and zip
.
The minOccurs
attribute may be "0"
, indicating that an element is optional, or any positive integer value that is less than or equal to the maxOccurs
value. The maxOccurs
value may be any positive integer greater than or equal to the min Occurs
value.
minOccurs ≥ 0 maxOccurs ≥ minOccurs
You may also define a maxOccurs
value to be "unbounded"
to specify that the element may occur an unlimited number of times.
For example, suppose Monson-Haefel Books wants to avoid storing a billing address that is identical to the shipping address, and to allow customers to buy an unlimited number of books on a single order. We can redefine the PurchaseOrder
type, setting the occurrence constraints on the billAddress
and the book
elements as highlighted in Listing 3-8.
Example 3-8. Using the "unbounded"
Occurrence Value
<complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string" /> <element name="accountNumber" type="unsignedShort" /> <element name="shipAddress" type="mh:USAddress" /> <element name="billAddress" type="mh:USAddress" minOccurs="0" /> <element name="book" type="mh:Book" maxOccurs="unbounded" /> <element name="total" type="float" /> </sequence> <attribute name="orderDate" type="date" /> </complexType>
The billAddress
element is now optional. It may occur at most once, because its maxOccurs
value is "1"
by default, but it may also be omitted because its minOccurs
value is "0"
. The book
element must be present at least once because the default value of minOccurs
is "1"
, but it may be repeated many times because its maxOccurs
is "unbounded"
.
Attributes also have occurrence constraints, but they are different from those of elements. Instead of maxOccurs
and minOccurs
, attribute types declare the use
occurrence constraint, which may be "required"
, "optional"
, or "prohibited"
, indicating that the attribute must, may, or may not be used, respectively. The default is "optional"
. An attribute might be "prohibited"
if you want to stop the use of a particular attribute, perhaps one that is inappropriate or no longer in use.
In PurchaseOrder
we want to make the orderDate
attribute mandatory, so Listing 3-9 sets its use
occurrence constraint to "required"
.
Example 3-9. Declaring the use
Value of an Attribute
<complexType name="PurchaseOrder">
<sequence>
<element name="accountName" type="string" />
<element name="accountNumber" type="unsignedShort" />
<element name="shipAddress" type="mh:USAddress" />
<element name="billAddress" type="mh:USAddress"
minOccurs="0" />
<element name="book" type="mh:Book"
maxOccurs="unbounded" />
<element name="total" type="float" />
</sequence>
<attribute name="orderDate" type="date" use="required" />
</complexType>
An attribute may also have a default value, to be assigned if no value is explicitly declared in the instance document. For example, the USAddress
type may include an attribute called category
that can have the value "business"
, "private"
, or "government"
. Almost all addresses used by Monson-Haefel Books are business addresses, so we set the default
for the category
attribute to "business"
in Listing 3-10.
Example 3-10. Declaring the Default Value of an Attribute
<complexType name="USAddress">
<sequence>
<element name="name" type="string" />
<element name="street" type="string" />
<element name="city" type="string" />
<element name="state" type="string" />
<element name="zip" type="string" />
</sequence>
<attribute name="category" type="string" default="business" />
</complexType>
The default
attribute can be used only when the use
attribute is "optional"
(recall that "optional"
is the default value for the use
attribute). It wouldn't make sense to declare a default
when the use
is "required"
or "prohibited"
. If the use
attribute is "required"
, there is no need for a default because the attribute must appear in the instance document. If the use
is "prohibited"
, the attribute's not allowed so there is no sense having a default value.
An attribute may also be declared fixed
: A fixed value is assigned to the attribute no matter what value appears in the XML instance document. This feature is useful in rare situations where you want to force a particular attribute always to have the same value. For example, if a particular schema is assigned a version number, then that version number should be fixed for that schema (UDDI does this).
Most of the time you'll base complex types on sequence
elements, but occasionally you may want to use the all
element. Unlike sequence
, which defines the exact order of child elements, the XML schema all
element allows the elements in it to appear in any order. Each element in an all
group may occur once or not at all; no other multiplicity is allowed. In other words, minOccurs
is always "0"
and maxOccurs
is always "1"
. Finally, only single elements may be used in an all
group; it can't include other groupings like sequence
or all
. Listing 3-11 shows the schema for the address
element using the all
element grouping instead of sequence
.
Example 3-11. Using the XML Schema all
Element
<?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com/jwsbook" > ... <complexType name="USAddress"> <all> <element name="name" type="string" /> <element name="street" type="string" /> <element name="city" type="string" minOccurs="0"/> <element name="state" type="string" minOccurs="0"/> <element name="zip" type="string" /> </all> </complexType> ... </schema>
In Listing 3-11 the name
, street
, and zip
elements must be present in the instance document, but the city
and state
elements may be absent. The elements can be in any order, but none of the elements may occur more than once. Listing 3-12 shows a valid instance of the USAddress
type as defined using the all
element in Listing 3-11.
Example 3-12. An Instance of the Schema in Listing 3-11
<?xml version="1.0" encoding="UTF-8"?> <addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook" > <zip>90952</zip> <street>1516 2nd Ave</street> <name>Amazon.com</name> </addr:address>
Notice the missing city
and state
elements and that the order of the elements is different from that in the type definition.
In addition to declaring simple and complex types, a schema may also declare global elements, which XML instance documents can refer to directly. Global elements are declared as direct children of the schema
element, rather than children of a complex type. For example, the following shows a portion of the schema defined in Listing 3-5, which declared the purchaseOrder
element (shown in bold) to be global.
<?xml version="1.0" encoding="UTF-8"?>
<schema
xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
targetNamespace="http://www.Monson-Haefel.com/jwsbook">
<element name="purchaseOrder" type="mh:PurchaseOrder"/>
<complexType name="PurchaseOrder">
<sequence>
<element name="accountName" type="string"/>
<element name="accountNumber" type="unsignedShort"/>
<element name="shipAddress" type="mh:USAddress"/>
<element name="billAddress" type="mh:USAddress"/>
<element name="book" type="mh:Book"/>
<element name="total" type="float"/>
</sequence>
<attribute name="orderDate" type="date"/>
</complexType>
...
</schema>
An XML document based on Listing 3-5 can use the purchaseOrder
element as in Listing 3-6.
<?xml version="1.0" encoding="UTF-8"?> <po:purchaseOrder orderDate="2003-09-22" xmlns:addr="http://www.Monson-Haefel.com/jwsbook"> <accountName>Amazon.com</accountName> <accountNumber>923</accountNumber> <shipAddress> ... </po:purchaseOrder>
The root element of a valid XML document must have a corresponding global element declaration in the schema. A schema may define more than one global element. For example, we can modify the schema for Monson-Haefel Books so that it declares two global elements: purchaseOrder
and address
. Listing 3-13 illustrates.
Example 3-13. Defining Multiple Element Declarations
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com/jwsbook"> <element name="address" type="mh:USAddress"/> <element name="purchaseOrder" type="mh:PurchaseOrder"/> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="mh:USAddress"/> <element name="billAddress" type="mh:USAddress"/> <element name="book" type="mh:Book"/> <element name="total" type="float"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> <complexType name="USAddress"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> <element name="state" type="string"/> <element name="zip" type="string"/> </sequence> </complexType> ... </schema>
The schema in Listing 3-13 allows you to create XML documents in which the purchaseOrder
element is the root, but it also allows you to create XML documents in which the address
element is the root. Listing 3-14 is an XML document that defines the address
element as its root element and conforms to the schema in Listing 3-13.
Example 3-14. An Address Document Based on the Monson-Haefel Books Schema
<?xml version="1.0" encoding="UTF-8"?> <addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook"> <name>AMAZON.COM</name> <street>1850 Mercer Drive</street> <city>Lexington</city> <state>KY</state> <zip>40511</zip> </addr:address>
By declaring two different global elements in the Monson-Haefel Books schema (Listing 3-13), you effectively create two schema-verifiable markup languages, a Purchase Order Markup Language and a U.S. Address Markup Language. The implication here is that a single schema can be used to validate two—indeed many—different kinds of documents. XML schema also supports global attributes that can be referred to anywhere in the schema, and that provide a consistent attribute name and type across elements. An example of a standard global attribute is xml:lang
, which any element can use to indicate the language used in an element's value ("es"
for Spanish, "en"
for English, and so on).
Local elements are those declared within the scope of a complex type. In Listing 3-13 all the elements, except for purchaseOrder
and address
, are local elements, because they are declared within one complex type or another. Similarly, orderDate
is a local attribute. Table 3-5 illustrates.
Table 3-5. Global and Local Elements in Listing 3-13
Global Elements | Local Elements |
---|---|
purchaseOrder address |
accountName accountNumber shipAddress billAddress book total name street city state zip |
In a nutshell, global elements and attributes are declared as direct children of the schema
element, while local elements and attributes are not; they are the children of complex types.
In Section 2.2.2 you learned that elements can be qualified by a namespace, or unqualified; that is, that elements in an XML document may or may not require QName prefixes. Global elements and attributes must always be qualified, which means that in an XML instance you must prefix them to form a QName. The exception is when a global element is a member of the default namespace, in which case it does not have to be qualified with a prefix—all unqualified elements are assumed to be part of the default namespace. The default namespace does not apply to global attributes; global attributes must always be prefixed.
While global elements and attributes must always be qualified, local elements may not need to be qualified. XML schema defines two attributes, elementsForm Default
and attributesFormDefault
, that determine whether local elements in an XML instance need to be qualified with a prefix or not. For example, the schema for the Address Markup Language can be modified to require namespace prefixes on all local elements in an XML instance, as in Listing 3-15.
Example 3-15. Declaring That Elements Must Be Namespace-Qualified
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:mh="http://www.Monson-Haefel.com/jwsbook"
targetNamespace="http://www.Monson-Haefel.com/jwsbook"
elementFormDefault="qualified" >
<element name="address" type="mh:USAddress" />
<complexType name="USAddress">
<sequence>
<element name="name" type="string" />
<element name="street" type="string" />
<element name="city" type="string" />
<element name="state" type="string" />
<element name="zip" type="string" />
</sequence>
</complexType>
...
</schema>
When the elementFormDefault
attribute is set to "qualified"
, in any XML instance all the local elements in the targetNamespace
must be qualified with a prefix. For example, Listing 3-16 shows an XML instance that conforms to the schema in Listing 3-15.
Example 3-16. Qualified Local Elements in an XML Document
<?xml version="1.0" encoding="UTF-8"?> <addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook" > <addr:name>AMAZON.COM</addr:name> <addr:street>1850 Mercer Drive</addr:street> <addr:city>Lexington</addr:city> <addr:state>KY</addr:state> <addr:zip>40511</addr:zip> </addr:address>
If, on the other hand, the value for formElementDefault
is "unqualified"
, only the global elements must be qualified. Listing 3-17 represents a valid XML instance when formElementDefault
is "unqualified"
. Notice that the address
element is qualified with the addr
prefix, but the local elements (name
, street
, city
, state
, and zip
) are not.
Example 3-17. Unqualified Local Elements in an XML Document
<?xml version="1.0" encoding="UTF-8"?> <addr:address xmlns:addr="http://www.Monson-Haefel.com/jwsbook" > <name>AMAZON.COM</name> <street>1850 Mercer Drive</street> <city>Lexington</city> <state>KY</state> <zip>40511</zip> </addr:address>
The attributeFormDefault
attribute works in exactly the same way. If the value is "qualified"
, then the attributes for targetNamespace
must be qualified with a prefix. If attributeFormDefault
is "unqualified"
, they do not.
The default value of the fromElementDefault
and the attributeElementDefault
attributes is "unqualified"
, so if they're not used then the local attributes and elements of targetNamespace
do not need to be qualified. All of the XML documents before Listing 3-14 were unqualified by default, which is why the global elements (address
and purchaseOrder
) had prefixes but the other elements did not.
If the XML document declares a default namespace, then all elements without prefixes are assigned to that namespace. This rule makes things tricky because unqualified elements are not supposed to be qualified, yet if there is a default namespace, then they are assigned to that namespace and are effectively qualified. As an exercise can you explain why the XML document in Listing 3-18 is valid for the XML schema in Listing 3-15?
Listing 3-15 requires that all elements be qualified. Listing 3-18 declares the default namespace, which is the namespace automatically assigned any element that is not prefixed, so even though the elements in Listing 3-18 are not prefixed, they are qualified and are therefore valid when checked against the XML schema in Listing 3-15.
You are free to configure your schemas any way you want, but I've found that it's generally less confusing if you require that all elements be namespace-qualified by setting elementFormDefault
equal to "qualified"
. That said, this book uses both qualified and unqualified local elements with abandon. You'll see this kind of inconsistency in your real-world development efforts, and it's best if you get used to thinking about local-element qualification early in your work with XML.
The whole point of schemas is that they define the grammar by which XML documents can be validated. In other words, schemas are used by parsers to verify that an XML document conforms to a specific markup language.
To validate an XML document against one or more schemas, you need to specify which schemas to use. You do so by identifying the schemas' locations, using the schemaLocation
attribute, which is an XML schema-instance attribute.
The XML document in Listing 3-18 uses this attribute to declare the location of the one schema it's based on.
Example 3-18. Using schemaLocation
with XML documents
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder orderDate="2003-09-22" xmlns="http://www.Monson-Haefel.com/jwsbook" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook http://www.Monson-Haefel.com/jwsbook/po.xsd"> <accountName>Amazon.com</accountName> <accountNumber>923</accountNumber> <shipAddress> <name>AMAZON.COM</name> <street>1850 Mercer Drive</street> <city>Lexington</city> <state>KY</state> <zip>40511</zip> </shipAddress> <billAddress> <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </billAddress> <book> <title>J2EE Web Services</title> <quantity>300</quantity> <wholesale-price>24.99</wholesale-price> </book> <total>8997.00</total> </purchaseOrder>
The second namespace declared in Listing 3-18, xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
is the XML schema-instance namespace, which is defined by the XML schema specification. The XML schema specification explicitly defines a few attributes belonging to this namespace, which can be used in XML documents, including the xsi:schemaLocation
attribute. Another important attribute from the XML schema-instance namespace is xsi:type
, which is addressed in Section 3.2.
The xsi:schemaLocation
attribute helps an XML processor locate the actual physical schema document used by the XML instance. Each schema is listed in an xsi:schemaLocation
attribute as a namespace-location pair, which associates a namespace with a physical URL. In Listing 3-18, the Monson-Haefel namespace, "http://www.Monson-Haefel.com/jwsbook"
, is associated with a schema file located at Monson-Haefel Books' Web site. You can use xsi:schemaLocation
to point at several schemas if you need to. For example, we can add the schema location for the XML schema-instance, as in Listing 3-19.
Example 3-19. Declaring Multiple Schema Locations
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder orderDate="2003-09-22" xmlns="http://www.Monson-Haefel.com/jwsbook" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook http://www.Monson-Haefel.com/jwsbook/po.xsd http://www.w3.org/2001/XMLSchema-instance http://www.w3.org/2001/XMLSchema.xsd">
You use white space to separate the namespace and the location URL in each namespace-location set—and to separate namespace-location pairs from each other. For readability, it's a good idea to use more white space to separate sets than to separate each namespace from its location.
You don't actually need to specify the XML schema-instance schema location,[3] because it must be supported natively by any XML schema validating parser, but you should list any other schemas used in an XML document.
For the schemas identified by xsi:schemaLocation
to be useful, they must explicitly define themselves as belonging to one of the namespaces identified in the XML instance document. In this case the schema, Listing 3-12, belongs to the Monson-Haefel Books namespace, "http://www.Monson-Haefel.com/jwsbook"
, the same namespace specified by the instance document.
A schema can be located on the Internet, as the Monson-Haefel Books schema in Listing 3-18 is, or on a local hard drive. When using a local schema, specify the location relative to the directory in which the XML document is located. For example, Listing 3-20 shows a schema that's in the same local directory as the XML instance.
Example 3-20. Pointing to a Schema on a Local File System
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder orderDate="2003-09-22" xmlns="http://www.Monson-Haefel.com/jwsbook" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook po.xsd"> <accountName>Amazon.com</accountName> <accountNumber>923</accountNumber>
It's important to note that the xsi:schemaLocation
attribute is considered a “hint” by the XML schema specification, which means that XML parsers are not required to use the schema identified by xsi:schemaLocation
, but a good parser will, and some, like Xerces-J, allow you to override the location identified by the xsi:schemaLocation
attribute programmatically—useful if you want to avoid downloading the schema every time an XML document based on it is parsed; you can use a cached copy instead of the original.
The xsi:schemaLocation
attribute is usually declared in the root element of an XML document, but it doesn't have to be. You can declare it later in the document, as long as it's in the scope of the elements it applies to.
The key goal of Web services is interoperability, so choosing technologies and standards like XML, SOAP, and WSDL, which are supported by the majority of platforms, is critical. XML is the foundation of Web service interoperability, but even XML can trip you up if you're not careful, particularly the more advanced XML schema types. The painful truth is that XML schema is still new, and some Web service platforms do not support all of its features. That said, according to the WS-I Basic Profile 1.0, Web services must support all of the XML schema features, including those covered in this “Advanced” section.
XML schema supports type inheritance much as object-oriented programming languages do, but XML schema inheritance is actually more comprehensive than in most object-oriented languages. Unfortunately, the richness of XML schema inheritance can cause interoperability headaches.
Many Web service platforms map XML schema types to native primitive types, structures, and objects so that developers can manipulate XML data using constructs native to their programming environment. For example, JAX-RPC maps some of the XML schema built-in types to Java primitives, and basic complex types to Java beans. JAX-RPC can map most derived complex types to Java beans, but not all. Similar limitations are found in other platforms like .NET and SOAP::Lite for Perl. Most object-oriented languages do not support the full scope of inheritance defined by the XML schema specification. For this reason, you should use type inheritance in schemas with care.
Complex types can use two types of inheritance: extension and restriction. Both allow you to derive new complex types from existing complex types. Extension broadens a derived type by adding elements or attributes not present in the base type, while restriction narrows a derived type by omitting or constraining elements and attributes defined by the base type.
An extension type inherits the elements and attributes of its base type, and adds new ones. For example, we could redefine the USAddress
type to be an extension of a base type called Address
as shown in Listing 3-21.
Example 3-21. Using XML Schema Inheritance
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <schema targetNamespace="http://www.Monson-Haefel.com/jwsbook" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <element name="address" type="mh:Address"/> <complexType name="Address"> <sequence> <element name="name" type="string"/> <element name="street" type="string" maxOccurs="unbounded"/> <element name="city" type="string"/> <element name="country" type="string"/> </sequence> <attribute name="category" type="string" default="business"/> </complexType> <complexType name="USAddress"> <complexContent> <extension base="mh:Address"> <sequence> <element name="state" type="string"/> <element name="zip" type="string"/> </sequence> </extension> </complexContent> </complexType> ... </schema>
The complexType
and extension
elements in Listing 3-21 tell us that USAddress
extends Address
. It adds the state
and zip
elements, so that the USAddress
type has a total of six elements (name
, street
, city
, state
, zip
, and country
).
The base type Address
defined in Listing 3-21 can be used to create other derived types as well. For example, we could extend it to define a United Kingdom address type, UKAddress
, as in Listing 3-22.
Example 3-22. A UK Address Type Extends the Address Type in Listing 3-21
<complexType name="UKAddress" > <complexContent> <extension base="mh:Address"> <sequence> <element name="postcode" type="string"/> </sequence> </extension> </complexContent> </complexType>
We now have two types derived from the Address
type, USAddress
and UKAddress
, which capture the addressing proper to their respective postal systems.[4]
Restriction is very easy to understand. You simply redefine or omit those elements and attributes that change, and list all the other elements and attributes exactly as they were in the base type. For example, we can create a USAddress
type that omits the city
and state
elements, as shown in Listing 3-23. (If you have a zip code you don't need a city and state, because any zip code can be cross-referenced to a specific city and state.)
Example 3-23. An Extension of the USAddress
Type Defined in Listing 3-21
<complexType name="BriefUSAddress"> <complexContent> <restriction base="mh:USAddress"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="zip" type="string"/> </sequence> <attribute name="category" type="string" default="business"/> </restriction> </complexContent> </complexType>
In this example, the derived type, BriefUSAddress
, contains the name
, street
, and zip
elements, but not the city
, state
, and country
elements, because the schema simply omits them. In addition we have redefined the occurrence constraints on the street
element so that it may occur only once (recall that the default values of maxOccurs
and minOccurs
are both "1"
). Compare BriefUSAddress
to the Address
base type in Listing 3-21, which defined the street
element with a maxOccurs
equal to "unbounded"
.
While the above paragraph is correct, there are some important limits on what you can do: You cannot omit an element from a restriction unless the parent type declared it to be optional(minOccurs="0")
. In addition, the derived type's occurrence constraints cannot be less strict than those of its base type. For example, you cannot constrain an element to minOccurs="0"
and maxOccurs="4"
in the child if the parent's element is defined as minOccurs="1"
and maxOccurs="2"
. The restricted occurrence attributes must fall within the boundaries defined by the parent type. For the BriefUSAddress
in Listing 3-23 to work, we will need to redefine the USAddress
type in Listing 3-21 to make the city
and state
elements optional (set minOccurs="0"
); if we don't, the parser will report an error.
The necessity of repeating all the elements and attributes, even if they don't change, makes restriction a bit cumbersome, but it's the only logical way of indicating which elements and attributes are omitted or constrained.
While restriction is useful, it's used less than extension because it doesn't map as well to programming languages. For this reason, it's risky to use restriction when defining complex types in your XML documents.
The real power of extension, and of restriction for that matter, is that derived types can be used polymorphically with elements of the base type. In other words, you can use a derived type in an instance document in place of the base type specified in the schema.
For example, suppose we redefine the PurchaseOrder
type to use the base Address
type for its billAddress
and shipAddress
elements, instead of the USAddress
type, as shown in Listing 3-24.
Example 3-24. Setting Up Polymorphism in a Schema
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com/jwsbook" elementFormDefault="qualified"> <element name="address" type="mh:Address"/> <element name="purchaseOrder" type="mh:PurchaseOrder"/> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="mh:Address"/> <element name="billAddress" type="mh:Address"/> <element name="book" type="mh:Book"/> <element name="total" type="float"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> ... </schema>
Because XML schema supports polymorphism, an instance document can now use any type derived from Address
for the shipAddress
and billAddress
elements. For example, in Listing 3-25 the XML instance of PurchaseOrder
uses BriefUSAddress
for the billAddress
element and UKAddress
for the shipAddress
element.
Example 3-25. Using Polymorphism in an XML Instance
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder orderDate="2003-09-22" xmlns="http://www.Monson-Haefel.com/jwsbook" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.Monson-Haefel.com/jwsbook http://www.Monson-Haefel.com/jwsbook/po2.xsd"> <accountName>Amazon.com</accountName> <accountNumber>923</accountNumber> <shipAddress xsi:type="mh:UKAddress"> <name>Amazon.co.uk</name> <street>Ridgmont Road</street> <city>Bedford</city> <country>United Kingdom</country> <postcode>MK43 0ZA</postcode> </shipAddress> <billAddress xsi:type="mh:BriefUSAddress"> <name>Amazon.com</name> <street>1516 2nd Ave</street> <zip>90952</zip> </billAddress> <book> <title>Java Web Services</title> <quantity>300</quantity> <wholesale-price>24.99</wholesale-price> </book> <total>8997.00</total> </purchaseOrder>
The xsi:type
attribute explicitly declares the type of the element in the instance document. Explicitly declaring an element's type with xsi:type
tells the parser to validate the element against the derived type instead of the type declared in the schema. You can think of this as “casting” an element, similar to casting a value in Java. The xsi:type
must be a type derived from the element's type declared in the schema.
The xsi:type
belongs to the XML schema-instance namespace, which is defined by the XML schema specification for use in instance documents. It's the same namespace that's used for the schemaLocation
attribute.
You can declare complex types to be abstract much as you do Java classes. For example, although the Address
type is a good base type for USAddress
, UKAddress
, and BriefUSAddress
, it's too vague to be used directly in an instance document. To prevent such use, you can declare the type to be abstract. For example, if we add abstract="true"
to the earlier definition of Address
, as in the following snippet, it cannot be used directly in an instance document. A member of its substitution group (the types derived from it) must be used instead.
<complexType name="Address" abstract="true">
<sequence>
<element name="name" type="string"/>
<element name="street" type="string" maxOccurs="unbounded"/>
<element name="city" type="string"/>
<element name="country" type="string"/>
</sequence>
<attribute name="category" type="string" default="business"/>
</complexType>
You can also declare complex types to be final, just as Java classes can be final, to prevent a complex type from being used as a base type for restriction or extension. The possible values for the final
attribute are "restriction"
, "extension"
, and "#all"
.
For example, we can declare the USAddress
type defined in Listing 3-21 to be “final by extension,” which prevents it from being extended but allows restriction.
<complexType name="USAddress" final="extension">
<complexContent>
<extension base="mh:Address">
<sequence>
<element name="state" type="string"/>
<element name="zip" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
If a type is declared final="restriction",
it can be extended but not restricted. If the final attribute equals "#all"
, the type cannot used as a base type at all.
The built-in simple types are atomic and very restrictive, so they are an excellent foundation for validating data. For example, an unsignedShort
type cannot contain letters (only digits), it cannot contain a decimal point, and its value must be between 0 and 65,535. That's pretty restrictive, but what if it's not restrictive enough? XML schema allows us to create new simple types that are derived from existing simple types in order to constrain further the range of possible values that a simple type may represent.
For example, PurchaseOrder
declares the total
element as an XML schema float
type, which means it can contain any decimal value that can be represented with 32 bits of precision. That's a huge range of values, which includes very large negative and positive numbers. For example, both 2,093,020.99 and –24.9941 are valid float
values. Monson-Haefel Books wants to limit the value of the total
element to a much smaller range: any dollar amount between $0.00 and $100,000.00, a normal range of values used in purchase orders.
To constrain data in the total
element to this range, we restrict the built-in float
type to create a new type called Total
, as shown in Listing 3-26.
Example 3-26. Defining a Simple Type
<simpleType name="Total"> <restriction base="float"> <minInclusive value="0"/> <maxExclusive value="100000"/> </restriction> </simpleType>
We declare the new Total
simple type with the simpleType
schema element. The restriction
element enables us to limit the range of an existing type, as well as determine its format.
The restriction
element for simple types contains one or more facet elements. A facet is an element that represents an aspect or characteristic of the built-in type that can be modified. For example, the Total
simple type declares that its minInclusive
facet is "0"
and its maxExclusive
facet is "100000"
, thereby specifying that values held by elements of this type must be at least zero and less then 100,000.
The XML schema specification defines several facets you may use when restricting a float
type. The modifiable facets for float
are shown in Table 3-6.[5]
Table 3-6. Float Facets
Float Facet | Meaning |
---|---|
| The inclusive upper bound. The value may not exceed this amount. |
| The exclusive upper bound. The value must be less than this amount. |
| The inclusive lower bound. The value must be at least this amount. |
| The exclusive lower bound. The value must be greater than this amount. |
| The format of the value, defined using a regular expression. |
| The set of allowed values. |
You can use the Total
type in PurchaseOrder
or elsewhere, just as you can a built-in type. Listing 3-27 shows the PurchaseOrder
type using the new Total
simple type.
Example 3-27. Using Derived Simple Types in a Schema
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:mh="http://www.Monson-Haefel.com/jwsbook" targetNamespace="http://www.Monson-Haefel.com/jwsbook" elementFormDefault="qualified" > ... <simpleType name="Total"> <restriction base="float"> <minInclusive value="0.00"/> <maxExclusive value="100000.00"/> </restriction> </simpleType> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="mh:Address"/> <element name="billAddress" type="mh:Address"/> <element name="book" type="mh:Book"/> <element name="total" type="mh:Total"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> ... </schema>
There are many kinds of facets, and each built-in type is assigned a subset of facets, which can be used to create new simple types. A complete list of facets for each data type can be found in XML Schema Part 2: Data Types.[6]
Most built-in types support the pattern
facet, which is very powerful. While other facets are pretty self-explanatory, the pattern
facet will look strange if you've never worked with regular expressions before. In XML schema, a regular expression is used to verify that the contents of an element or attribute adhere to a predefined character pattern.
For example, in addition to restricting the range of the Total
type defined in Listing 3-27 to values between 0 and 100,000, we can declare a pattern
facet to limit fractional amounts to two digits after the decimal point, as is conventional for dollar amounts.
<simpleType name="Total">
<restriction base="float">
<pattern value="[0-9]+.[0-9]{2}" />
<minInclusive value="0"/>
<maxExclusive value="100000" />
</restriction>
</simpleType>
The regular expression "[0-9]+.[0-9]{2}"
specifies that there must be at least one digit before the decimal point and exactly two digits following the decimal point. The following table shows valid and invalid values for the Total
type.
Valid Values | Invalid Values |
---|---|
0.00 | .00 |
0.10 | 0.1 |
1.01 | –1.00 |
99,999.99 | 100,001.00 |
The pattern
facet is commonly applied to string
types. For example, we can define a USZipCode
type that restricts a string
value either to five digits, or to nine digits with the last four set off by a hyphen. Listing 3-28 illustrates.
Example 3-28. Using the pattern
Facet
<simpleType name="USZipCode">
<restriction base="string">
<pattern value="[0-9]{5}(-[0-9]{4})?" />
</restriction>
</simpleType>
We could modify Listing 3-21 as in the following snippet to use the USZipCode
simple type for the USAddress
and BriefUSAddress
types, to provide stronger validation of U.S. addresses.
<complexType name="USAddress" final="extension"> <complexContent> <extension base="mh:Address"> <sequence> <element name="state" type="string"/> <element name="zip" type="mh:USZipCode"/> </sequence> </extension> </complexContent> </complexType> ... <complexType name="BriefUSAddress"> <complexContent> <restriction base="mh:USAddress"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="zip" type="mh:USZipCode"/> </sequence> <attribute name="category" type="string" default="business"/> </restriction> </complexContent> </complexType>
Appendix B provides an overview of schema regular expressions. Readers already familiar with regular expressions may also find this appendix valuable because XML schema's regular-expression syntax has some small but important differences from that of other languages or tools (Perl, for example).
The enumeration
facet restricts the value of any simple type (except boolean
) to a set of distinct values. For example, we can create a new USState
type, which restricts the value of a string
type to two-letter state abbreviations as shown in Listing 3-29.[7]
Example 3-29. Defining an Enumeration
<simpleType name="USState"> <restriction base="string"> <enumeration value="AK"/> <!-- Alaska --> <enumeration value="AL"/> <!-- Alabama --> <enumeration value="AR"/> <!-- Arkansas --> <!-- and so on --> </restriction> </simpleType>
We can then modify Listing 3-21 to use the USState
enumeration type in the state
element of the USAddress
type, in order to constrain its value to valid U.S. state abbreviations.
<complexType name="USAddress" final="extension">
<complexContent>
<extension base="mh:Address">
<sequence>
<element name="state" type="mh:USState"/>
<element name="zip" type="mh:USZipCode"/>
</sequence>
</extension>
</complexContent>
</complexType>
The simple types we have examined thus far are all atomic, which means that each one represents a single piece of data. For example, although the name
element of Address
may contain spaces (e.g., <name>Richard W. Monson-Haefel</name>
), the string
value is still considered one piece of data. List and union types, however, allow us to define elements or attributes that contain multiple pieces of data separated by spaces.
While list types are supported by many Web service platforms, union types are not. Union and list types should be used with care, especially when interoperability across programming environments is important.
A list is a sequence of simple-type values separated by white space. For example, you can define a USStateList
type to contain several USState
type values, as shown in Listing 3-30.
Example 3-30. Defining a List Type
<simpleType name="USStateList"> <list itemType="mh:USState"/> </simpleType>
In an instance document, an element of the USStateList
type could contain zero or more state abbreviations separated by spaces.
<list-of-states>CA NY FL AR NH</list-of-states>
A list type may have length
, minLength
, maxLength
, and enumeration
facets. The length facets control the number of tokens contained by the element or attribute, while the enumeration
facet defines a strict set of valid values.
A list type can be based on any simple type, built-in or derived, but not on other list types or on complex types. XML schema defines a built-in list type called NMTOKENS
. NMTOKENS
is a list of the NMTOKEN
simple type, which is a string without spaces, line feeds, or tabs (see Table 3-2). NMTOKENS
can be used only with attributes.
List types should be based on simple types that do not have spaces because the parser assumes that spaces separate values in the list. NMTOKENS
is recommended for lists of attributes. For elements, a list type based on the token
type (see Table 3-2) or simple types with no spaces, such as USState
and USZipCode
, is strongly recommended.
A union is a set of valid simple types. It's a lot like a list type, except it can accommodate more than one kind of simple type. For example, the union type USStateOrZipUnion
allows the value to be either a USStateList
type or a USZipCode
type, as shown in Listing 3-31.
Example 3-31. Defining a Union Type
<simpleType name="USStateOrZipUnion"> <union memberTypes="mh:USStateList mh:USZipCode"/> </simpleType>
An element or attribute based on this type can hold either a USStateList
or a USZipCode
. It cannot, however, contain a mix of values. In other words, a USStateOrZipUnion
can contain a list of state codes or a single zip code, but not a mix of states and zip codes or more than one zip code. In the following example, valid and invalid values are shown for the hypothetical location
element of type USStateOrZipUnion
.
<!-- valid use of union type --> <location>CA NJ AK</location> <location>94108</location> <!-- invalid use of union type --> <location>94108 CA 554011 MN</location>
You can combine an element declaration with a complex or simple type declaration to create an anonymous type. An anonymous type is not named and cannot be referred to outside the element that declares it. For example, throughout this chapter the Purchase Order schema has defined a PurchaseOrder
type and a purchaseOrder
element separately, as shown in the following snippet from Listing 3-13.
<element name="purchaseOrder" type="mh:PurchaseOrder"/> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="mh:Address"/> <element name="billAddress" type="mh:Address"/> <element name="book" type="mh:Book"/> <element name="total" type="mh:Total"/> </sequence> <attribute name="orderDate" type="date"/> </complexType>
The PurchaseOrder
type is not very useful outside the purchaseOrder
element, so we can combine the two declarations into one as in Listing 3-32.
Example 3-32. Defining an Anonymous Type
<element name="purchaseOrder"> <complexType> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="mh:Address"/> <element name="billAddress" type="mh:Address"/> <element name="book" type="mh:Book"/> <element name="total" type="mh:Total"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> </element>
We've combined definition of the PurchaseOrder
type with declaration of the purhaseOrder
element. Notice that the element declaration doesn't need a type
attribute because it defines its own type, and that the complexType
declaration doesn't declare a name
attribute; it's anonymous.
Anonymous types can simplify schemas, but they can also be abused if nested too deeply or applied indiscriminately. A balanced approach is better, using a combination of anonymous types and named types. For example, the purchaseOrder
anonymous type can contain other anonymous types as well as named types. In Listing 3-33 the book
and total
elements are nested anonymous types, while USAddress
remains a named type that is defined elsewhere.
Example 3-33. Nesting Anonymous Types
<element name="purchaseOrder"> <complexType> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="mh:Address"/> <element name="billAddress" type="mh:Address"/> <element name="book"> <complexType> <sequence> <element name="title" type="string"/> <element name="quantity" type="unsignedShort"/> <element name="wholesale-price" type="float"/> </sequence> </complexType> </element> <element name="total"> <simpleType> <restriction base='float'> <minInclusive value="0"/> <maxExclusive value="100000"/> <pattern value="[0-9]+.[0-9]{2}"/> </restriction> </simpleType> </element> </sequence> <attribute name="orderDate" type="date"/> </complexType> </element>
Anonymous types can be based on complex or simple types. In this example, the total
element is defined with an anonymous simple type, using simple type inheritance.
Because anonymous types have no names, they cannot be referred to outside the element that defines them. Anonymous types are not reusable, and you should employ them only when you know that the type won't be useful in other schemas. For example, the book
and total
elements are based on anonymous types that might well be useful in other circumstances; you might benefit from defining them separately as named types. In the end it's a judgment call.
You can combine schemas using two different elements, include
and import
. An import
allows you to combine schemas from different namespaces, while an include
lets you combine schemas from the same namespace.
A schema may import types from other schemas, allowing more modular schema design and type reuse. For example, we can define a separate schema and namespace for all the types related to mailing addresses: Address
, USAddress
, UKAddress
, BriefUSAddress
, USZipCode
, and USState
. This schema would define the complete Address Markup Language for Monson-Haefel Books. Listing 3-34 shows an abridged version of this schema.
Example 3-34. The Address Markup Schema
<?xml version="1.0" encoding="UTF-8" ?>
<schema
targetNamespace="http://www.Monson-Haefel.com/addr"
xmlns:addr="http://www.Monson-Haefel.com/addr"
xmlns="http://www.w3.org/2001/XMLSchema">
<element name="address" type="addr:Address"/>
<simpleType name="USZipCode">
<restriction base="string">
<pattern value="[0-9]{5}(-[0-9]{4})?"/>
</restriction>
</simpleType>
<simpleType name="USState">
<restriction base="string">
<enumeration value="AK"/> <!-- Alaska -->
<enumeration value="AL"/> <!-- Alabama -->
<enumeration value="AR"/> <!-- Arkansas -->
<!-- and so on -->
</restriction>
</simpleType>
<complexType name="Address" abstract="true">
<sequence>
<element name="name" type="string"/>
<element name="street" type="string" maxOccurs="unbounded"/>
<element name="city" type="string"/>
<element name="country" type="string"/>
</sequence>
<attribute name="category" type="string" default="business"/>
</complexType>
<complexType name="USAddress" final="extension">
<complexContent>
<extension base="addr:Address">
<sequence>
<element name="state" type="addr:USState"/>
<element name="zip" type="addr:USZipCode"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="UKAddress">
<complexContent>
<extension base="addr:Address">
<sequence>
<element name="postcode" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="BriefUSAddress">
<complexContent>
<restriction base="addr:USAddress">
<sequence>
<element name="name" type="string"/>
<element name="street" type="string"/>
<element name="zip" type="addr:USZipCode"/>
</sequence>
<attribute name="category" type="string" default="business"/>
</restriction>
</complexContent>
</complexType>
</schema>
The targetNamespace
of the Address Markup schema is "http://www.Monson-Haefel.com/jwsbook/ADDR"
, which is a separate namespace from that of the purchase-order elements. Because the PurchaseOrder
type depends on the Address
type, we'll need to import the Address Markup schema into the Purchase Order schema as in Listing 3-35.
Example 3-35. Importing a Schema
<?xml version="1.0" encoding="UTF-8" ?> <schema targetNamespace="http://www.Monson-Haefel.com/jwsbook/PO" xmlns:po="http://www.Monson-Haefel.com/jwsbook/PO" xmlns:addr="http://www.Monson-Haefel.com/jwsbook/ADDR" xmlns="http://www.w3.org/2001/XMLSchema"> <import namespace="http://www.Monson-Haefel.com/jwsbook/ADDR" schemaLocation="http://www.Monson-Haefel.com/jwsbook/addr.xsd" /> <element name="purchaseOrder" type="po:PurchaseOrder"/> <simpleType name="Total"> <restriction base="float"> <minInclusive value="0.00"/> <maxExclusive value="100000.00"/> <pattern value="[0-9]+.[0-9]{2}"/> </restriction> </simpleType> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="addr:Address"/> <element name="billAddress" type="addr:Address"/> <element name="book" type="po:Book"/> <element name="total" type="po:Total"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> <complexType name="Book"> <sequence> <element name="title" type="string"/> <element name="quantity" type="unsignedShort"/> <element name="wholesale-price" type="float"/> </sequence> </complexType> </schema>
The import mechanism enables you to combine schemas to create larger, more complex schemas. It's very useful when you see that some aspects of a schema, such as the address types, are reusable and need their own namespace and schema. The imported namespace needs to be assigned a prefix before we can use it. In this case, it's assigned the prefix addr
in the root schema element.
In addition to the import
element, there is another way of combining schemas called include, which can be used only to combine schemas with exactly the same targetNamespace
. Including is useful when a schema becomes large and difficult to maintain. The Purchase Order schema has not become that unwieldy, but just as an example, we could place the definitions of the Total
and Book
types into a separate schema, then use an include
element to combine them with the Purchase Order schema. Listing 3-36 shows a schema document for the Total
and Book
elements, which we'll soon include in the Purchase Order schema.
Example 3-36. The Book and Total Schema
<?xml version="1.0" encoding="UTF-8" ?> <schema targetNamespace="http://www.Monson-Haefel.com/jwsbook/PO" xmlns:po="http://www.Monson-Haefel.com/jwsbook/PO" xmlns="http://www.w3.org/2001/XMLSchema"> <simpleType name="Total"> <restriction base="float"> <minInclusive value="0.00"/> <maxExclusive value="100000.00"/> <pattern value="[0-9]+.[0-9]{2}"/> </restriction> </simpleType> <complexType name="Book"> <sequence> <element name="title" type="string"/> <element name="quantity" type="unsignedShort"/> <element name="wholesale-price" type="float"/> </sequence> </complexType> </schema>
Here the Book
and Total
types have been placed in their own schema document—but notice that the targetNamespace
is the same as in the Purchase Order schema in Listing 3-35. We can combine these two schemas using an include
statement. Listing 3-37 shows the use of both import
and include
.
Example 3-37. Using Import
and Include
Together
<?xml version="1.0" encoding="UTF-8" ?> <schema targetNamespace="http://www.Monson-Haefel.com/jwsbook/PO" xmlns:po="http://www.Monson-Haefel.com/jwsbook/PO" xmlns:addr="http://www.Monson-Haefel.com/jwsbook/ADDR" xmlns="http://www.w3.org/2001/XMLSchema"> <include schemaLocation="http://www.Monson-Haefel.com/jwsbook/po.xsd" /> <import namespace="http://www.Monson-Haefel.com/jwsbook/ADDR" schemaLocation="http://www.Monson-Haefel.com/jwsbook/addr.xsd" /> <element name="purchaseOrder" type="po:PurchaseOrder"/> <complexType name="PurchaseOrder"> <sequence> <element name="accountName" type="string"/> <element name="accountNumber" type="unsignedShort"/> <element name="shipAddress" type="addr:Address"/> <element name="billAddress" type="addr:Address"/> <element name="book" type="po:Book"/> <element name="total" type="po:Total"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> </schema>
Notice that we don't specify the namespace of the included schema, because it's expected to match the targetNamespace
of the schema, doing the including.
XML schema provides a standard typing system for defining markup languages and validating XML documents. SOAP, WSDL, and UDDI data structures are all defined in XML schema, so a good understanding of this technology is essential. There is a lot more to XML schema than this chapter covers; it would require an entire book to do the topic justice, but with this primer under your belt you are prepared to investigate new concepts by reading the W3C recommendation entitled “XML Schema” directly.
The W3C's XML schema recommendation is the last word on the topic, but it's not always an easy read. It's divided into three parts. The Primer, Part 0, is usually the best place to start when you need to learn about new features. It's a non-normative overview with examples. Part 1 covers the structure of schemas, and Part 2 defines concisely the XML schema data types. You can find these three documents at
http://www.w3.org/TR/xmlschema-0/
http://www.w3.org/TR/xmlschema-1/
http://www.w3.org/TR/xmlschema-2/
Although XML schema is the basis of Web services in J2EE, it's not the only XML schema language available today. In fact there are a couple of other schema languages, including DTDs (see Appendix A), Schematron, RELAX-NG, and a few others. Of these, Schematron appears to be the best complement to XML schema, or at least to offer validation checks that XML schema cannot duplicate.
Schematron is based on Xpath and XSLT and is used for defining context-dependent rules for validating XML documents. For example, in the purchase-order document you could use Schematron to ensure that the value of the total
element equals the value of the quantity
element multiplied by the value of the wholesale-price
element, as shown in Listing 3-38.
Example 3-38. PurchaseOrder
Instance Document
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder orderDate="2003-09-22" xmlns:mh="http://www.Monson-Haefel.com/jwsbook"> ... <book> <title>J2EE Web Services</title> <quantity>300</quantity> <wholesale-price>24.99</wholesale-price> </book> <total>7485.00</total> </purchaseOrder>
XML schema does not provide this type of business-rule support, so you may well want to use Schematron in combination with XML schema to provide more robust validation. You can find out more about Schematron at Rick Jelliffe's Web site, http://www.ascc.net/xml/schematron/.
[1] The Universal Character Set (ISO/IEC 10646-1993) is a superset of all other character codes, including UTF-8 and UTF-16.
[2] World Wide Web Consortium, “XML Schema Part 2: Datatypes”, W3C Recommendation, May 2, 2001. Available at http://www.w3.org/TR/xmlschema-2/.
[3] Whether you should is open to interpretation. For example, declaring the location of the XML Schema-Instance works with the Apache Xerces-J's SAX parser but not with Altova's XMLSpy (version 5, release 3).
[4] Actually, many UK addresses may not have a city, but we will ignore that detail in this example.
[5] Missing from this table is the whiteSpace
facet, not shown because its value cannot be modified for a float
type, which must always be "constrain"
.
[6] World Wide Web Consortium, “XML Schema Part 2: Datatypes”, W3C Recommendation, May 2, 2001. Available at http://www.w3.org/TR/xmlschema-2/.
[7] The USState
type would also include the District of Columbia (D.C.), commonwealths (e.g., Puerto Rico), territories (e.g., Virgin Islands), as well as special codes for the U.S. armed services abroad (e.g., Armed Forces Europe).
18.190.160.221