XML Schema and XML Namespaces

We’ll start with the basics of XML Schemas and XML Namespaces. It’s assumed that you already understand how to use basic XML elements and attributes. If you don’t, you should probably read a primer on XML before proceeding. I recommend the book Learning XML by Erik T. Ray (O’Reilly). If you already understand how XML Schema and XML Namespaces work, skip ahead to the section on SOAP.

XML Schema

An XML Schema is similar in purpose to a DTD (Document Type Definition), which validates the structure of an XML document. To illustrate some of the basic concepts of XML Schema, let’s start with an XML document with address information:

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<address>
  <street>3243 West 1st Ave.</street>
  <city>Madison</city>
  <state>WI</state>
  <zip>53591</zip>
</address>

In order to ensure that the XML document contains the proper type of elements and data, the Address information must be evaluated for correctness . There are two ways that the correctness of an XML document can be measured: if it is well formed and if it is valid . To be well formed, an XML document must obey the syntactic rules of the XML markup language: it must use proper attribute declarations, the correct characters to denote the start and end of elements, and so on. Most XML parsers based on standards like SAX and DOM detect documents that aren’t well formed automatically.

In addition to being well formed, it’s sometimes important to check that the document uses the right types of elements and attributes in the correct order and structure. A document that meets these criteria is called valid. However, the criteria for validity have nothing to do with XML itself; they have more to do with application in which the document is used. For example, the Address document would not be valid if it didn’t include the Zip code or state elements. In order to validate an XML document, you need a way to represent these application-specific constraints.

The XML Schema for the Address XML document looks like this:

<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:titan="http://www.titan.com/Reservation"
    targetNamespace="http://www.titan.com/Reservation">

  <element name="address" type="titan:AddressType"/>

                  <complexType name="AddressType">
                  <sequence>
                  <element name="street" type="string"/>
                  <element name="city" type="string"/>
                  <element name="state" type="string"/>
                  <element name="zip" type="string"/>
                  </sequence>
                  </complexType>

</schema>

The first thing to focus on in this XML Schema is the <complexType> element, which declares a type of element in much the same way that a Java class declares a type of object. The <complexType> element explicitly declares the names, types, and order of elements that an AddressType element may contain. In this case, it may contain five elements of type string in the following order: street, city, state, and zip. Validation is pretty strict, so any XML document that claims conformance with this XML Schema must contain exactly the right elements with the right data types, in the correct order.

There are about two dozen simple data types that are automatically supported by XML Schema, called built-in types. Built-in types are a part of the XML Schema language and are automatically supported by any XML Schema-compliant parser. Table 14-1 shows a short list of some of the built-in types. It also shows Java types that correspond to each built-in type. (Table 14-1 presents only a subset of all the XML Schema (XSD) built-in types, but it’s more than enough for this book.)

Table 14-1. XML Schema built-in types and their corresponding Java types

XML Schema built-in type

Java primitive type

byte

byte

boolean

boolean

short

short

int

int

long

long

float

float

double

double

string

java.lang.String

dateTime

java.util.Calendar

integer

java.math.BigInteger

decimal

java.math.BigDecimal

By default, each element declared by a <complexType> must occur once in an XML document, but you can specify that an element is optional or that it must occur more than once by using the occurrence attributes. For example, we can say that the street element must occur once but may occur two times:

  <complexType name="AddressType">
    <sequence>
      <element name="street" type="string" maxOccurs="2" minOccurs="1" />
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>

By default, the maxOccurs and minOccurs attributes are always 1, indicating that the element must occur exactly once. Setting the maxOccurs to "2" allows an XML document to have two street elements or just one. You can also set the maxOccurs to "unbounded“, which means the element may occur as many times as needed. Setting minOccurs to "0" means the element is optional and can be omitted.

The <element> declarations are nested under a <sequence> element, which indicates that the elements must occur in the order they are declared. You can also nest the elements under an <all> declaration, which allows the elements to appear in any order. The following shows the AddressType declared with an <all> element instead of a <sequence> element:

<complexType name="AddressType">
    <all>
      <element name="street" type="string" maxOccurs="2" minOccurs="1" />
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </all>
  </complexType>

In addition to declaring elements of XSD built-in types, you can declare elements based on complex types. This is similar to how Java class types declare fields that are other Java class types. For example, we can define a CustomerType that makes use of the AddressType:

<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:titan="http://www.titan.com/Reservation"
    targetNamespace="http://www.titan.com/Reservation">

  <element name="customer" type="titan:CustomerType"/>

<complexType name="CustomerType">
    <sequence>
      <element name="last-name" type="string"/>
      <element name="first-name" type="string"/>
      <element name="address" type="titan:AddressType"/>
    </sequence>
  </complexType>
<complexType name="AddressType">
    <sequence>
      <element name="street" type="string" />
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
 </complexType>

</schema>

This XSD tells us that an element of CustomerType must contain a <last-name> and <first-name> element of built-in type string, and an element of type AddressType. This is pretty straightforward, except for the titan: prefix on AddressType. That prefix identifies the XML Namespace of the AddressType; we’ll discuss namespaces later in the chapter. For now, just think of it as declaring that the AddressType is a custom type defined by Titan Cruises; it’s not a standard XSD built-in type. An XML document that conforms to the Customer XSD would look like this:

<?xml version='1.0' encoding='UTF-8' ?>
<customer>
  <last-name>Jones</last-name>
  <first-name>Sara</first-name>
  <address>
    <street>3243 West 1st Ave.</street>
    <city>Madison</city>
    <state>WI</state>
    <zip>53591</zip>
  </address>
</customer>

Building on what you’ve learned so far, we can create a Reservation schema, using the CustomerType and the AddressType, and a new CreditCardType:

<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:titan="http://www.titan.com/Reservation"
    targetNamespace="http://www.titan.com/Reservation">

  <element name="reservation" type="titan:ReservationType"/>

  <complexType name="ReservationType">
    <sequence>
      <element name="customer" type="titan:CustomerType"/>
      <element name="cruise-id" type="int"/>
      <element name="cabin-id" type="int"/>
      <element name="price-paid" type="double"/>
    </sequence>
  </complexType>
  <complexType name="CustomerType">
    <sequence>
      <element name="last-name" type="string"/>
      <element name="first-name" type="string"/>
      <element name="address" type="titan:AddressType"/>
      <element name="credit-card" type="titan:CreditCardType"/>
    </sequence>
  </complexType>
  <complexType name="CreditCardType">
    <sequence>
      <element name="exp-date" type="dateTime"/>
      <element name="number" type="string"/>
      <element name="name" type="string"/>
      <element name="organization" type="string"/>
    </sequence>
  </complexType>
  <complexType name="AddressType">
    <sequence>
      <element name="street" type="string"/>
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>
</schema>

An XML document that conforms to the Reservation XSD would include information describing the customer (name and address), credit card information, and the identity of the cruise and cabin that is being reserved. This document might be sent to Titan Cruises from a travel agency that cannot access the TravelAgent EJB to make reservations. Here’s an XML document that conforms to the Reservation XSD:

<?xml version='1.0' encoding='UTF-8' ?>
<reservation>
  <customer>
    <last-name>Jones</last-name>
    <first-name>Sara</first-name>
    <address>
      <street>3243 West 1st Ave.</street>
      <city>Madison</city>
      <state>WI</state>
      <zip>53591</zip>
    </address>
    <credit-card>
      <exp-date>09-2005</exp-date>
      <number>0394029302894028930</number>
      <name>Sara Jones</name>
      <organization>VISA</organization>
    </credit-card>
  </customer>
  <cruise-id>123</cruise-id>
  <cabin-id>333</cabin-id>
  <price-paid>6234.55</price-paid>
</reservation>

At runtime, the XML parser compares the document to its Schema, ensuring that the document conforms to the rules set down by the Schema. If the document doesn’t adhere to the Schema, it is considered invalid, and the parser produces error messages. An XML Schema checks that XML documents received by your system are properly structured, so you won’t encounter errors while parsing the documents and extracting the data. For example, if someone sent your application a Reservation document that omitted the credit-card element, the XML parser could reject the document as invalid before your code even sees it: you don’t have to worry about errors in your code caused by missing information in the document.

This brief overview represents only the tip of the iceberg. XML Schema is a very rich XML typing system and can only be given sufficient attention in a text dedicated to the subject. For an in-depth and insightful coverage of XML Schema, read XML Schema: The W3C’s Object-Oriented Descriptions for XML by Eric van der Vlist (O’Reilly) or read the XML Schema specification, starting with the primer at the W3C (World Wide Web Consortium) web site (http://www.w3.org/TR/xmlschema-0/).

XML Namespaces

The Reservation schema defines an XML markup language that describes the structure of a specific kind of XML document. Just as a Class is a type of Java object, an XML markup language, defined by an XML Schema, is a type of XML document. In some cases, it’s convenient to combine two or more XML markup languages into a single document, so that the elements from each markup language can be validated separately using different XML Schemas. This is especially useful when you want to reuse a markup language in many difference contexts. For example, the AddressType defined in the previous section is useful in a variety of contexts, not just the Reservation XSD, so it could be defined as a separate markup language in its own XML Schema.

<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.titan.com/Address">

  <complexType name="AddressType">
    <sequence>
      <element name="street" type="string"/>
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>
</schema>

In order to use different markup languages in the same XML document, you must clearly identify the markup language to which each element belongs. Here is an XML document for a reservation, but this time we are using XML Namespaces to separate the address information from the reservation information:

<?xml version='1.0' encoding='UTF-8' ?>
<res:reservation xmlns:res="http://www.titan.com/Reservation" >
  <res:customer>
    <res:last-name>Jones</res:last-name>
    <res:first-name>Sara</res:first-name>

    <addr:address xmlns:addr="http://www.titan.com/Address">
      <addr:street>3243 West 1st Ave.</addr:street>
      <addr:city>Madison</addr:city>
      <addr:state>WI</addr:state>
      <addr:zip>53591</addr:zip>
    </addr:address>

    <res:credit-card>
      <res:exp-date>09-2005</res:exp-date>
      <res:number>0394029302894028930</res:number>
      <res:name>Sara Jones</res:name>
      <res:organization>VISA</res:organization>
    </res:credit-card>
  </res:customer>
  <res:cruise-id>123</res:cruise-id>
  <res:cabin-id>333</res:cabin-id>
  <res:price-paid>6234.55</res:price-paid>
</res:reservation>

All the elements for the address information are prefixed with characters addr:, and all the reservation elements are prefixed with res:. These prefixes allow parsers to identify and separate the elements that belong to the Address markup from those that belong to the Reservation markup. As a result, the address elements can be validated against the Address XSD while the reservation elements are validated against the Reservation XSD. The prefixes are assigned using XML Namespace declarations, which are shown in bold in the previous listing. An XML Namespace declaration follows this format:

xmlns:prefix="URI"

The prefix can be anything you like, as long as it does not include blanks or any special characters. We use prefixes that are abbreviations for the name of the markup language: res stands for Reservation XSD and addr stands for Address XSD. This is the convention that most XML documents follow, but it’s not a requirement; you could use prefixes like foo or bar or anything else you fancy.

While the prefix can be any arbitrary token, the URI must be very specific. A URI (Universal Resource Identifier) is an identifier that is a superset of the URL (Universal Resource Locator) that you use every day to look up web pages. In most cases, people use the stricter URL format for XML Namespaces because URLs are familiar and easy to understand. The URI used in the XML Namespace declaration identifies the exact markup language that is employed. It doesn’t have to point at a web page or an XML document; it just needs to be unique to that markup language. For example, the XML Namespace used by the Address markup is different from the URL used for the Reservation markup.

xmlns:addr="http://www.titan.com/Address"
xmlns:res="http://www.titan.com/Reservation"

The URI in the XML Namespace declaration should match the target namespace declared by an XML Schema. Here is the Address XSD with the target namespace declaration shown in bold. The URL value of the targetNamespace attribute is identical to the URL assigned to the add: prefix in the reservation document, shown earlier.

<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.titan.com/Address">

  <complexType name="AddressType">
    <sequence>
      <element name="street" type="string"/>
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>
</schema>

The targetNamespace attribute identifies the unique URI of the markup language; it is the permanent identifier for that XML Schema. Whenever elements from the Address XSD are used in some other document, the document must use an XML Namespace declaration to identify those elements as belonging to the Address markup language.

Prefixing every element in an XML document with its namespace identifier is a bit tedious, so XML Namespace allows you to declare a default namespace that applies to all elements that are not prefixed. The default namespace is simply an XML Namespace declaration that has no prefix (xmlns= "URL" ). For example, we can use a default name in the reservation document for all Reservation elements:

<?xml version='1.0' encoding='UTF-8' ?>
<reservation xmlns="http://www.titan.com/Reservation" >
  <customer>
    <last-name>Jones</last-name>
    <first-name>Sara</first-name>

    <addr:address xmlns:addr="http://www.titan.com/Address">
      <addr:street>3243 West 1st Ave.</addr:street>
      <addr:city>Madison</addr:city>
      <addr:state>WI</addr:state>
      <addr:zip>53591</addr:zip>
    </addr:address>

    <credit-card>
      <exp-date>09-2005</exp-date>
      <number>0394029302894028930</number>
      <name>Sara Jones</name>
      <organization>VISA</organization>
    </credit-card>
  </customer>
  <cruise-id>123</cruise-id>
  <cabin-id>333</cabin-id>
  <price-paid>6234.55</price-paid>
</reservation>

None of the Reservation elements names are prefixed. Any nonprefixed element belongs to the default namespace. The Address elements do not belong to the Reservation namespace, so they are prefixed to indicate which namespace they belong to. The default namespace declaration has scope; in other words, it applies to the element in which it is declared (if that element has no namespace prefix), and to all nonprefixed elements nested under that element. We can use the scoping rules of namespace to further simplify the Reservation document by allowing the Address elements to override the default namespace with their own default namespace.

<?xml version='1.0' encoding='UTF-8' ?>
<reservation xmlns="http://www.titan.com/Reservation" >
  <customer>
    <last-name>Jones</last-name>
    <first-name>Sara</first-name>

    <address xmlns="http://www.titan.com/Address">
      <street>3243 West 1st Ave.</street>
      <city>Madison</city>
      <state>WI</state>
      <zip>53591</zip>
    </address>

    <credit-card>
      <exp-date>09-2005</exp-date>
      <number>0394029302894028930</number>
      <name>Sara Jones</name>
      <organization>VISA</organization>
    </credit-card>
  </customer>
  <cruise-id>123</cruise-id>
  <cabin-id>333</cabin-id>
  <price-paid>6234.55</price-paid>
</reservation>

The Reservation default namespace applies to the <reservation> element and all of its children except for the Address elements. The <address> element and its children have defined their own default namespace, which overrides the default namespace of the <reservation> element.

Default namespaces do not apply to attributes. As a result, any attributes used in an XML document should be prefixed with a namespace identifier. The only exceptions to this rule are attributes defined by the XML language itself, such as the xmlns attribute, which establishes an XML Namespace declaration. This attribute doesn’t need to be prefixed because it is part of XML language.

XML Namespaces are just URIs that uniquely identify a namespace, but do not actually point at a resource. In other words, you don’t normally use the URI of a XML Namespace to look something up. It’s usually just an identifier. However, you might want to indicate the location of the XML Schema associated with an XML Namespace so that a parser can upload it and use it in validation. This is accomplished using the schemaLocation attribute:

<?xml version='1.0' encoding='UTF-8' ?>
<reservation xmlns="http://www.titan.com/Reservation"  
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance"
              xsi:schemaLocation="http://www.titan.com/Reservation 
                                    http://www.titan.com/schemas/reservation.xsd">
  <customer>
    <last-name>Jones</last-name>
    <first-name>Sara</first-name>

    <address xmlns="http://www.titan.com/Address"
                  xsi:schemaLocation="http://www.titan.com/Address 
                                                   http://www.titan.com/schemas/address.xsd">
      <street>3243 West 1st Ave.</street>
      <city>Madison</city>
      <state>WI</state>
      <zip>53591</zip>
    </address>

    <credit-card>
      <exp-date>09-2005</exp-date>
      <number>0394029302894028930</number>
      <name>Sara Jones</name>
      <organization>VISA</organization>
    </credit-card>
  </customer>
  <cruise-id>123</cruise-id>
  <cabin-id>333</cabin-id>
  <price-paid>6234.55</price-paid>
</reservation>

The schemaLocation attribute provides a list of values as Namespace-Location value pairs. The first value is the URI of the XML Namespace; the second is the physical location (URL) of the XML Schema. The following schemaLocation attribute states that all elements belonging to the Reservation Namespace (http://www.titan.com/Reservation) can be validated against a XML Schema located at the URL http://www.titan.com/reservation.xsd:

xsi:schemaLocation="http://www.titan.com/Reservation 
                      http://www.titan.com/schemas/reservation.xsd">

The schemaLocation attribute is not a part of the XML language, so we’ll actually need to prefix it with the appropriate namespace in order to use it. The XML Schema specification defines a special namespace that can be used for schemaLocation (as well as other attributes). That namespace is http://www.w3.org/2001/XMLSchema-Instance. In order to properly declare the schemaLocation attribute, declare its XML namespace and prefix it with the identifier for that namespace as shown in the following snippet:

<?xml version='1.0' encoding='UTF-8' ?>
<reservation xmlns="http://www.titan.com/Reservation" 
                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance"
                     xsi:schemaLocation="http://www.titan.com/Reservation 
                                       http://www.titan.com/schemas/reservation.xsd">

A namespace declaration only needs to be defined once; it applies to all elements nested under the element in which it’s declared. The convention is to use the prefix xsi for the XML Schema Instance namespace (http://www.w3.org/2001/XMLSchema-Instance).

XML Schemas also use XML Namespaces. Let’s look at XML Schema for the Address markup language with a new focus on the use of XML Namespaces:

<?xml version='1.0' encoding='UTF-8' ?>
<schema 
    xmlns="http://www.w3.org/2001/XMLSchema"
                      targetNamespace="http://www.titan.com/Address"
                      xmlns:addr="http://www.titan.com/Address" >

  <element name="address" type="addr:AddressType"/>

  <complexType name="AddressType">
    <sequence>
      <element name="street" type="string"/>
      <element name="city" type="string"/>
      <element name="state" type="string"/>
      <element name="zip" type="string"/>
    </sequence>
  </complexType>

In this file, namespaces are used in three separate declarations. The first namespace declaration states that the default namespace is http://www.w3c.org/2001/XMLSchema, which is the namespace of the XML Schema specification. This declaration makes it easier to read the XSD because most of the elements do not need to be prefixed. The second declaration states that the target namespace of the XML Schema is the namespace of the Address markup. This tells us that all the types and elements defined in this XSD belong to that namespace. Finally, the third namespace declaration assigns the prefix addr to the target namespace so that types can be referenced exactly. For example, the top level <element> definition uses the name addr:AddressType to say that the element is of type AddressType, belonging to the namespace http://www.titan.com/Address.

Why do you have to declare a prefix for the target namespace? The reason is clearer when you examine the XSD for the Reservation XSD:

<?xml version='1.0' encoding='UTF-8' ?>
<schema 
    xmlns="http://www.w3.org/2001/XMLSchema"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance"
    xmlns:addr="http://www.titan.com/Address"
                      xmlns:res="http://www.titan.com/Reservation"
                      targetNamespace="http://www.titan.com/Reservation">

  <import namespace="http://www.titan.com/Address"
                                   xsi:schemaLocation="http://www.titan.com/Address.xsd" />

  <element name="reservation" type="res:ReservationType"/>

  <complexType name="ReservationType">
    <sequence>
      <element name="customer" type="res:CustomerType"/>
      <element name="cruise-id" type="int"/>
      <element name="cabin-id" type="int"/>
      <element name="price-paid" type="double"/>
    </sequence>
  </complexType>
  <complexType name="CustomerType">
    <sequence>
      <element name="last-name" type="string"/>
      <element name="first-name" type="string"/>
      <element name="address" type="addr:AddressType"/>
      <element name="credit-card" type="res:CreditCardType"/>
    </sequence>
  </complexType>
  <complexType name="CreditCardType">
    <sequence>
      <element name="exp-date" type="dateTime"/>
      <element name="number" type="string"/>
      <element name="name" type="string"/>
      <element name="organization" type="string"/>
    </sequence>
  </complexType>
</schema>

The Reservation XSD imports the Address XSD so that the AddressType can be used to define the CustomerType. You can see the use of namespaces in the definition of the CustomerType, which references types from both the Reservation and Address namespace, prefixed by addr and res:

<?xml version='1.0' encoding='UTF-8' ?>
<schema 
    xmlns="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance"
    xmlns:addr="http://www.titan.com/Address"
                      xmlns:res="http://www.titan.com/Reservation"
                      targetNamespace="http://www.titan.com/Reservation">
...
 <complexType name="CustomerType">
    <sequence>
      <element name="last-name" type="string"/>
      <element name="first-name" type="string"/>
      <element name="address" type="addr:AddressType"/>
      <element name="credit-card" type="res:CreditCardType"/>
    </sequence>
  </complexType>

Assigning a prefix to the Reservation namespace allowed us to distinguish between elements that are defined as Reservation types (e.g., credit-card) and elements that are defined as Address types (e.g., address). All the type attributes that reference built-in types string and int also belong to the XML Schema namespace, so we don’t need to prefix them. We could, though, for clarity. That is, we’d replace string and int with xsd:string and xsd:int. The prefix xsd references the XML Schema namespace; using it allows us to identify built-in types defined as XML Schema more clearly. It’s not a problem that the default namespace is the same as the namespace prefixed by xsd. By convention, the xsd prefix is the one used in most XML schemas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.74.25