Schema Validation and Type Assignment

Adding a schema to the ISSD does not automatically cause any input documents or result XML to be validated or annotated with types. There are two occasions during query evaluation when schema validation may occur:

The first is when an input document is opened, for example using the doc or collection function. Depending on the implementation, the processor may validate the input document at this time. However, a processor is not required to automatically validate input documents, even if it supports XML Schema. It can choose the way it finds and selects schemas for the input document. Additionally, the processor is not required to stop evaluating the query if an input document is found to be invalid but still well formed. You should consult the documentation for your XQuery implementation to determine how it handles these choices.

If you're relying on an input document being prevalidated in this way, it's a good idea to declare this. For example you can write:

declare variable $in as document-node(schema-element(catalog)) := doc("catalog.xml");

This causes the query to fail if the validation hasn't been done (or if validation failed). It also tells the query compiler what the expected type of $in is, which is useful information for optimization and error checking.

If an input document is validated, the definitions used must be consistent with any definitions added to the ISSD. For example, if your input document is a catalog.xml document that was validated using catalog.xsd, you cannot then import a different catalog schema that has conflicting definitions.

The second occasion is when a validate expression is used to explicitly validate documents and elements.

The Validate Expression

A validate expression can be used to validate a document or element node, which may come from an input document or be constructed in the query. It will validate the node according to a schema declaration if that declaration is in scope (i.e., if it is in the ISSD). For example:

validate strict { <product dept="ACC">
  <number>563</number>
  <name language="en">Floppy Sun Hat</name>
</product> }

validates the product element using a global product element declaration from the in-scope schema definitions, if one exists. This includes validating its attributes and descendants.

The syntax of a validate expression is shown in Figure 13-2.

Syntax of a validate expression

Figure 13-2. Syntax of a validate expression

The expression to be validated must be either a single element node or a single document node that has exactly one element child.

The value of a validate expression is a new document or element node (with a new identity) annotated with the appropriate type indicated in the element declaration.

As with all schema validation, it also fills in default or fixed values and normalizes whitespace. When a document node is being validated, full schema validation is performed. When an element node is being validated, certain validation constraints are skipped. These omitted constraints include identity constraint (key) validation, checking xs:ID values for uniqueness, and ensuring that xs:ENTITY, xs:NOTATION, and xs:IDREF values have matching entities, notations, and IDs.

Important

Not all implementations support the validate expression; it is an optional feature.

Validation Mode

The validation mode controls how strictly an element or document is validated. There are two possible validation modes:

strict

When it is strict, the processor requires that a declaration be present for the element in the validate expression and that it be valid according to those declarations. If the element is not valid or a declaration cannot be found for it, an error is raised.

lax

When it is lax, the processor validates the element if it can find a declaration for it. It may not be able to find declarations if, for example, the schema was not imported or provided by the processor. If a declaration is found, the element or attribute must be valid according to it, or an error is raised. If no declaration is found, the processor will attempt to recursively validate the element's children and attributes, and the process repeats. If no declarations are found in the entire tree or no validation errors are encountered, no error is raised.

In a validate expression, the validation mode can be specified just after the validate keyword. For example:

validate lax
 {<number>563</number>}

results in lax validation on the number element. If it is not specified, the default mode strict is used.

Assigning Type Annotations to Nodes

It is worth taking a closer look at how the validation process assigns type annotations to elements and attributes. If the node is valid according to the type designated in its declaration, the node is usually quite straightforward. For example, the product element in the previous example would be assigned the type ProductType. However, there are some special cases.

An element or attribute is assigned a generic type (xs:untyped for elements, and xs:untypedAtomic for attributes) if:

  • No schema validation was attempted.

  • It was not validated because it was included as part of a wildcard (xs:any or xs:anyAttribute) that does not require validation.

  • It is the result of an element constructor, and construction mode is set to strip. Construction mode is described in "Types and Newly Constructed Elements and Attributes," later in this chapter.

  • It is the result of a validate expression, but it was not validated against an in-scope schema definition. This might happen if the validation mode is lax with no relevant declaration in scope.

Another generic type, xs:anyType, is used in a few other cases. The difference between xs:anyType and xs:untyped is that an element of type xs:anyType may contain other elements that have specific types. Elements of type xs:untyped, on the other hand, always have children that are untyped. An element is assigned the type xs:anyType if:

  • When an input document was accessed, validation was attempted but the element was found to be invalid (or partially valid). Some implementations may allow the query evaluation to continue even if validation fails.

  • It is the result of an element constructor, and construction mode is set to preserve.

A node that is declared to have a union type is assigned the specific member type for which it was validated (which is the first one to which it conforms). For example, if the <a>12</a> element is validated using a union type whose member types are xs:integer and xs:string, in that order, it is assigned the type xs:integer, not the union type itself.

An element that uses the XML Schema attribute xsi:type for type substitution is assigned the type specified by xsi:type if it is valid according to that type definition.

Nodes and Typed Values

In most cases, you can retrieve the typed value of an element or attribute using the data function. Usually, it is simply the string value of the node, cast to the type of the element or attribute. For example, if the number element has the type xs:integer, the string value is 784 (type xs:string), while the typed value is 784 (type xs:integer). If the number element is untyped, its typed value is 784 (type xs:untypedAtomic).

There are two exceptions to this rule:

  • Elements whose types have element-only content (that is, they allow only children) do not have typed values, even if that particular element does not have any children. For example, the product element does not have a typed value if it is annotated with a type other than xs:untyped.

  • The typed value of an element or attribute whose type is a list type is a sequence of atomic values, one for each item in the list. For example, if the element <colorChoices>navy black</colorChoices> has a type that is a list of strings, the typed value is a sequence of two strings, navy and black.

An element's typed value will be the empty sequence in two cases:

  • Its type is a complex type with an empty content model.

  • It has been nilled, meaning that it has an attribute xsi:nil="true".

The typed value will not be the empty sequence just because the element has no content. For example, the typed value of <name></name> is the value "" (type xs:untypedAtomic) if name is untyped, and a zero-length string (type xs:string) if name has a complex type with mixed content but happens to be empty.

A summary of the rules for the typed values of elements and attributes appears in Table 13-2.

Table 13-2. Typed values of elements and attributes

Kind of node

Typed value

Type of typed value

An untyped element

The character data content of the element and all its descendants

xs:untypedAtomic

An element whose type is a simple type, or a complex type with simple content

The character data content of the element

The type of the element's content

An element whose type has mixed content

The character data content of the element and all its descendants

xs:untypedAtomic

An element whose type has element-only content

Error

N/A

An element whose type has empty content

( )

N/A

An untyped attribute

The attribute value

xs:untypedAtomic

A typed attribute

The attribute value

The type of the attribute

An element or attribute whose type is a list type

A sequence containing the values in the list

The list type's item type

Types and Newly Constructed Elements and Attributes

Newly constructed nodes don't automatically take on the type of their content. For example, the expression <abc>{2}</abc> does not create an abc element whose type annotation is xs:integer just because its content is of type xs:integer. In fact, element constructors that have simple content, as in this example, are always annotated with xs:untyped unless they are enclosed in a validate expression.

The type of a newly constructed element with complex content is also generic, but any children it copies from an input document may or may not retain their original types from the input document. This is determined by construction mode, which can have one of two values: strip or preserve. If construction mode is strip, the type of the newly constructed element, and all of its descendants, is xs:untyped. If the element is contained in a validate expression, it may then be annotated with a new schema type.

If construction mode is preserve, the type of the newly constructed element is xs:anyType, and all of its copied children retain their original types from the input document.

For example, suppose you construct a productList element with the following expression:

<productList>{doc("catalog.xml")//product}</productList>

Suppose also that the catalog.xml document has been validated with a schema, and the product elements from this document are annotated with the type ProductType. This query will result in a productList element that contains four product elements.

If construction mode is strip, both productList and all the product elements in the results will be annotated with xs:untyped. If construction mode is preserve, productList will be annotated with xs:anyType, and the product elements will be annotated with ProductType.

Construction mode is set using a construction declaration, which may appear in the query prolog. Its syntax is shown in Figure 13-3.

Syntax of a construction declaration

Figure 13-3. Syntax of a construction declaration

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.67