Chapter 5. Instances and schemas

There is a many-to-many relationship between instances and schemas. A schema can describe many valid instances, possibly with different root element names. Likewise, an instance may be described by many schemas, depending on the circumstances. For example, you may have multiple schemas for an instance, with different levels of validation. One may just validate the structure, while another checks every data item against a type. There may also be multiple schemas with different application information to be used at processing time. This chapter explains the interaction between schemas and instances.

5.1. Using the instance attributes

There are four attributes that can apply to any element in an instance. These four attributes, which are described in Table 5–1, are all in the XML Schema Instance Namespace, http://www.w3.org/2001/XMLSchema-instance. This namespace is commonly mapped to the prefix xsi.1

Table 5–1. Instance attributes

Image

Example 5–1 shows the use of xsi:type in an instance.

Because these four attributes are globally declared, their names must be prefixed in instances. You are required to declare the XML Schema Instance Namespace and map a prefix (preferably xsi) to it. However, you are not required to specify a schema location for these four attributes. You are also not required or even permitted to declare xsi:type as an attribute in the type definition for number. The attributes in the XML Schema Instance Namespace, like namespace declarations, are special attributes that a schema processor always recognizes without explicit declarations.

Example 5–1. Using an instance attribute


<product xmlns="http://datypic.com/prod"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <number xsi:type="ShortProdNumType">557</number>
  <size>10</size>
</product>


In fact, the number element in this example can have a simple type, even though elements with simple types are normally not allowed to have attributes.

5.2. Schema processing

5.2.1. Validation

Validation is an important part of schema processing. Validation determines whether an instance conforms to all of the constraints described in the schema. It involves checking all of the elements and attributes in an instance to determine that they have declarations and that they conform to those declarations and to the corresponding type definitions.

The validation process verifies:

Correctness of the data. Validating against a schema does not provide a 100% guarantee that the data is correct, but it can signal invalid formats or out-of-range values.

Completeness of the data. Validation can check that all required information is present.

Shared understanding of the data. Validation can make sure that the way you perceive the document is the same way that the sender perceives it.

Whether to validate your instances on a regular basis depends on a number of factors.

Where the instances originate. Within your organization, perhaps you have control over the application that generates instances. After some initial testing, you may trust that all documents coming from that application are valid, without performing validation. However, often the instances you are processing are originating outside your organization. You may be less likely to trust these documents.

Whether the instances were application-generated or user-generated. Human involvement can introduce typographical and other errors. Even with validating XML editors, it is still possible to introduce errors inadvertently during the handling of the documents.

Data quality. For example, if the instances are generated directly from an existing database, they may not be complete or 100% correct.

Performance. Obviously, it takes extra time to validate. If performance is critical, you may want to avoid some validation or write application-specific code that can validate more efficiently than a schema processor.

5.2.2. Augmenting the instance

In addition to validating the instance, a schema processor may alter the instance by

• Adding default and fixed values for elements and attributes

• Normalizing whitespace in element and attribute values that contain character data

Because of this, it is important that the sender and receiver of the document agree on the schema to use. If the receiver processes an element with a declaration that has a default value different from that of the sender’s declaration, it can alter the data of the element in ways unintended by the sender.

5.3. Relating instances to schemas

Instances can be related to schemas in a number of ways.

Using hints in the instance. The xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes can be used in the instance to provide a hint to the processor where to find the schema documents.

Application’s choice. Most applications will be processing the same type of instances repeatedly. These applications may already know where the appropriate schema documents are on the web, or locally, or even have them built in. In this case, the processor could either (1) ignore xsi:schemaLocation, or (2) reject documents containing xsi:schemaLocation attributes, or (3) reject documents in which the xsi:schemaLocation does not match the intended schema document.

User’s choice. The location of the schema document(s) can be specified, at processing time, by a command-line instruction or user dialog.

Dereferencing the namespace. The namespace name can be dereferenced to retrieve a schema document or resource directory. However, this is not typically done by XML Schema processors.

5.3.1. Using hints in the instance

XML Schema provides two attributes that act as hints to where the processor might find the schema document(s) for the instance. Different processors may ignore or acknowledge these hints in different ways.

These two attributes are: xsi:schemaLocation, for use with schema documents that have target namespaces, and xsi:noNamespaceSchemaLocation, for use with schema documents without target namespaces.

5.3.1.1. The xsi:schemaLocation attribute

The xsi:schemaLocation attribute allows you to specify a list of pairs that match namespace names with schema locations. Example 5–2 shows an instance that uses xsi:schemaLocation. The default namespace for the document is http://datypic.com/prod. The xsi prefix is assigned to the XML Schema Instance Namespace, so that the processor will recognize the xsi:schemaLocation attribute. Then, the xsi:schemaLocation attribute is specified to relate the namespace http://datypic.com/prod to the schema location prod.xsd.

Example 5–2. Using xsi:schemaLocation


<product xmlns="http://datypic.com/prod"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://datypic.com/prod prod.xsd">
  <number>557</number>
  <size>10</size>
</product>


The value of the xsi:schemaLocation attribute is actually at least two values separated by whitespace. The first value is the namespace name (in this example http://datypic.com/prod), and the second value is the URL for the schema location (in this example prod.xsd, a relative URI). The processor will retrieve the schema document from the schema location and make sure that its target namespace matches that of the namespace it is paired with in xsi:schemaLocation.

Since spaces are used to separate values in this attribute, you should not have spaces in your schema location path. You can replace a space with %20, which is standard for URLs. For example, instead of my schema.xsd, use my%20schema.xsd. To use an absolute path rather than a relative path, some processors require that you start your schema location with file:/// (with three forward slashes), as in file:///C:/Users/PW/Documents/prod.xsd.

If multiple namespaces are used in the document, xsi:schemaLocation can contain more than one pair of values, as shown in Example 5–3.

Example 5–3. Using xsi:schemaLocation with multiple pairs


<order xmlns="http://datypic.com/ord"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://datypic.com/prod prod.xsd
                           http://datypic.com/ord ord1.xsd">
  <items>
    <product xmlns="http://datypic.com/prod">
      <number>557</number>
      <size>10</size>
    </product>
  </items>
</order>


If you have a schema document that imports schema documents with different target namespaces, you do not have to specify schema locations for all the namespaces (if the processor has some other way of finding the schema documents, such as the schemaLocation attribute of import). For example, if ord1.xsd imports prod.xsd, it is not necessary to specify prod.xsd in the xsi:schemaLocation in the instance. You do still need to declare your namespaces using the xmlns attributes, as shown in the example.

It is not illegal to list two or more pairs of values that refer to the same namespace. In Example 5–3, you could refer to both ord1.xsd and ord2.xsd, repeating the same namespace name for each. However, this is not recommended because many processors will ignore all but the first schema location for a particular namespace.

It is generally a good practice to use one main schema document that includes or imports all other schema documents needed for validation. This simplifies the instance and makes name collisions more obvious.

The xsi:schemaLocation attribute may appear anywhere in an instance, in the tags of any number of elements. Its appearance in a particular tag does not signify its scope. However, it must appear before any elements that it would validate. It is most typical to put the xsi:schemaLocation attribute on the root element, for simplicity.

5.3.1.2. The xsi:noNamespaceSchemaLocation attribute

The xsi:noNamespaceSchemaLocation attribute is used to reference a schema document with no target namespace. xsi:noNamespaceSchemaLocation does not take a list of values; only one schema location may be specified. The schema document referenced cannot have a target namespace. Example 5–4 shows the use of xsi:noNamespaceSchemaLocation in an instance.

Example 5–4. Using xsi:noNamespaceSchemaLocation


<product xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="prod.xsd">
  <number>557</number>
  <size>10</size>
</product>


It is legal according to XML Schema to have both xsi:noNamespaceSchemaLocation and xsi:schemaLocation specified, but once again, you should check with your processor to see what it will accept.

5.4. The root element

Sometimes you want to be able to specify which element declaration is for the root element of the instance. For example, you may not want the document shown in Example 5–5 to be considered a valid instance, although the element itself is valid according to its declaration.

Example 5–5. A valid instance?


<number>557</number>


Schemas work similarly to DTDs in this regard. There is no way to designate the root. Any element conforming to a global element declaration can be a root element for validation purposes.

You can work around this by having only one global element declaration. If the number declaration is local, Example 5–5 is not valid on its own. However, there are times that you cannot avoid global element declarations either because you are using substitution groups or because you are importing element declarations over which you have no control. A better approach is to use the application to verify that the root element is the one you expect.

Using some schema processors, validation may not necessarily start at the root. It is possible to validate sections of instance documents with different schema documents using different xsi:schemaLocation hints, or to validate fragments of instance documents identified by IDs or XPointer expressions. Also, one schema document may describe several related types of instance documents (e.g., purchase orders and invoices) which may have different root elements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.142.128