Chapter 11. A Closer Look at Types

Chapter 2 briefly introduced the use of types in XQuery. This chapter delves deeper into the XQuery type system and its set of built-in types. It explains the automatic type conversions performed by the processors and describes the expressions that are relevant to types, namely type constructors, cast and castable expressions, and instance of expressions.

The XQuery Type System

XQuery is a strongly typed language, meaning that each function and operator is expecting its arguments or operands to be of a particular type. This means, for example, that you cannot perform arithmetic operations on strings, without explicitly telling the processor to treat your strings like numbers. This is similar to some common object-oriented programming languages, like Java and C#. It is in contrast to most scripting languages, like JavaScript, which will automatically coerce values to the appropriate type.

Advantages of a Strong Type System

There are several advantages to a strong type system. One of them is the early and reliable identification of errors in a query. Potential errors in the query can be determined statically before the query is even executed. For example, if you are trying to double a value that is a string (e.g., a product name), there is probably an error in the query. In addition, a strict type system allows for the identification of errors in the values of input data. This identification of errors can make queries easier to debug, and results in more reliable queries that are able to handle a variety of input data. This is especially true if schemas are used, because schema types can help identify possible errors. A schema allows the processor to tell you that the product name is a string and that you should not be trying to double it. Based on a schema, the processor can also tell you when you’ve specified a path that will never return any elements—for example, because of a misspelling or an invalid chain of steps.

Another advantage of a strong type system is optimization. Implementations can optimize performance if they know more about the types of data. This too is especially true if schemas are used, because schema types can help a processor find specific elements. If your schema says that all number elements appear as children of product elements, your processor only has to look in one place for the number elements you have requested in your query. If it knows that there is always only one number per product, it can further optimize certain comparison operations.

A strong type system has its disadvantages, too. One is that it can complicate query authoring, because more attention is being paid to types. For example, if you know you want to treat a numeric value like a string, you have to explicitly cast it to xs:string in order to perform string-related operations. Also, supporting an extensive type system can put a burden on implementers of the standard. This is why the more complex features—schema awareness and static typing—are optional features of the standard that are not available in all implementations.

Do You Need to Care About Types?

If you do not use schemas, your input data will be untyped. Usually, this means that you, as a query author, do not need to be especially concerned about types. Because of the type conversions described in “Automatic Type Conversions”, the processor will usually “do the right thing” with your data.

For example, you may pass an untyped price element to the round function, or multiply it by two. In these cases, the processor will automatically assume that the content of the price element is numeric, and convert it to a numeric type. Likewise, calling the substring function with a name element will assume that name contains a string.

There is the occasional “gotcha,” though. One example is comparing two untyped values by using general comparison operators (e.g., < or =). If the values are untyped, they are compared as strings. Therefore, if you compare the untyped price element <price>123.99</price> with the untyped price element <price>99.99</price>, the second will be considered greater because the string value starts with a greater digit. Similarly, order by clauses in FLWORs assume that untyped values are strings rather than numbers. In both of these cases, the prices need to be explicitly converted to numbers in order to be sorted or compared as numbers. Casting is described in “Constructors and Casting”.

With untyped text values, you need to be concerned when using the max and min functions. These two functions treat untyped data as if it is numeric. Therefore, the expression:

max(doc("catalog.xml")//name)

will raise error FORG0001. Instead, you need to cast the names to xs:string. One way to do this is to use the string function, as in:

max(doc("catalog.xml")//name/string())

If you do use schemas, you will be able to get more of the benefits of strong typing, but you will need to pay more attention to types when writing your query. Unlike some weakly typed languages, XQuery will not automatically convert values of one type to an unrelated type (for example, a string to a number). So, if your schema for some reason declares the price element to be of type xs:string, you will not be able to perform arithmetic operations, or call functions like round, on your price without explicitly casting it to a numeric type.

The Built-in Types

A wide array of simple types is built into XQuery. All the built-in types are covered individually in detail in Appendix B, with a description, lexical representations, and examples. In practice, you are likely to need only a handful of these built-in types. The XML Schema type system divides simple types into three varieties: atomic types, list types, and union types.

Atomic Types

The atomic types, shown in Figure 11-1, represent common datatypes such as strings, dates, and times. The built-in types are identified by qualified names that are prefixed with xs, because they are defined in the XML Schema Namespace. You can use all of these built-in types in your queries regardless of whether the implementation is actually schema-aware, and whether or not you are using schemas to validate your source or result documents.

Nineteen of the built-in types are primitive, meaning that they are the top level of the type hierarchy. Each primitive type has a value space, which describes all its valid values, and a set of lexical representations for each value in the value space. There is one lexical representation, the canonical representation, that maps one-to-one with each value in the value space. The canonical representation is important because it is the format used when a value is serialized or cast as a string.

For example, the primitive type xs:integer has a value that is equal to 12 in its value space. This value has multiple lexical representations that map to the same value, such as 12, +12, and 012. The canonical representation is 12. Some primitive types, such as xs:date, only have one lexical representation, which becomes, by default, the canonical representation.

The rest of the built-in types are derived (directly or indirectly) from one of the primitive types. The derived built-in types (and indeed, user-defined types) inherit the qualities of the primitive type from which they are derived, including their value space (possibly restricted), lexical representations, and canonical representations. Their values can also be substituted for each other. For example, the insert-before function expects a value of type xs:integer for its second argument. Nevertheless, it accepts a value of any type derived from xs:integer, such as xs:positiveInteger or xs:long.

Figure 11-1. The atomic type hierarchy

At the top of the built-in atomic type hierarchy is xs:anyAtomicType. This type encompasses all the other atomic types. No values ever actually have the type xs:anyAtomicType; they always have a more specific type. However, this type name can be used as a placeholder for all other atomic types. For example, the distinct-values function signature specifies that its first argument is xs:anyAtomicType. This means that atomic values of any type can be passed to this function.

List Types

A list type represents a list of possibly multiple atomic values of a particular type, known as its item type. There are three list types built into the type system: xs:IDREFS, xs:NMTOKENS, and xs:ENTITIES. They are defined as list types with item types xs:IDREF, xs:NMTOKEN, and xs:ENTITY, respectively. It is also possible to define new list types in a schema. List types are treated somewhat differently from atomic types in XQuery because, for example, they cannot appear in sequence types. However, other type-related features such as type constructors and casting are available for list types.

Union Types

A union type allows a value to be a choice among several different types, known as its member types. There is one union type built into the type system, xs:numeric, which is defined as the union of the three primitive numeric types: xs:double, xs:float, and xs:decimal. This is a convenient way of allowing certain functions and operators, for example the round function, to accept values of any numeric type. It is also possible to define new union types in a schema. Union types whose members are atomic types (like xs:numeric) are treated much like atomic types in XQuery.

Types, Nodes, and Atomic Values

Element and attribute nodes, as well as atomic values, all have types associated with them. Sequences don’t technically have types, although they can be matched to sequence types, as described later in this chapter.

Nodes and Types

All element and attribute nodes have type annotations, which indicate the type of their content. An element or attribute can come to be annotated with a specific type when it is validated against a schema. This might occur when the document is first opened, or as the result of a validate expression. Schema validation is discussed further in Chapter 14.

If an element or attribute has not been validated and does not have a specific type, it is automatically assigned a generic type, namely xs:untyped (for elements) or xs:untypedAtomic (for attributes). Sometimes these nodes are referred to as untyped, despite the fact that they do have a type, albeit a generic one.

Attributes, and most elements, also have a typed value. This typed value is an atomic value extracted from the node, taking into account the node’s type annotation. For example, if the number element has been validated and given the type xs:integer, its typed value is 784 (type xs:integer). If the number element is untyped, its typed value is 784 (type xs:untypedAtomic). The data function allows you to retrieve the typed value of a node.

Atomic Values and Types

Every atomic value has a type. An atomic value might have a specific type because:

  • It is extracted from an element or attribute that has a type annotation. This can be done explicitly using the data function, or automatically using many functions and operators.

  • It is the result of a constructor function or a cast expression.

  • It is the value of a literal expression. Literals surrounded by single or double quotes are considered to have the type xs:string, whereas non-quoted numeric values have the type xs:integer, xs:decimal, or xs:double, depending on their format.

  • It is the result of an expression or function that returns a value of a particular type—for example, a comparison expression returns an xs:boolean value, and the count function returns an xs:integer.

A value might not have a specific type if it was extracted from an untyped element or attribute. In this case, it is automatically assigned the generic type xs:untypedAtomic. Untyped atomic values can be used wherever a typed value can be used, and they are usually cast to the required type automatically. This is because every function and expression has rules for casting untyped values to an appropriate type.

Type Checking in XQuery

Because XQuery is a strongly typed language, an XQuery processor verifies that all items are of the appropriate type and raises type errors when they are not. There are two phases to processing a query: the static analysis phase and the dynamic evaluation phase, both of which have type-checking components.

The Static Analysis Phase

During the static analysis phase, the processor checks the query itself, along with any related schemas, for static errors, without regard to the input documents. It is roughly equivalent to compiling the query; that is, checking for syntax errors and other errors that will occur regardless of the input document. The processor raises static errors during the static analysis phase. Examples of static errors include:

  • Syntax errors, such as invalid keywords or mismatched brackets

  • Referring to a variable or calling a function that has not been declared

  • Using namespace prefixes that are not declared

Some implementations support an optional static typing feature, which means that they evaluate the types of expressions in a query during the static analysis phase. This allows errors in the query to be caught early and more reliably, and can help optimize queries. A number of expressions, functions, and syntactic constructs are available solely to support static typing. These are discussed in Chapter 15.

Implementations that don’t claim to support the static typing feature might also do static analysis in order to reduce the amount of runtime type checking needs. It’s always a good idea to declare the types of your variables, function parameters, and function return types to give the processor as much information as possible.

The Dynamic Evaluation Phase

During the dynamic evaluation phase, the processor checks the query again, this time with the data from the input document. Some expressions that did not result in errors during the analysis phase will in fact result in errors during the evaluation phase. For example, the expression:

sum(doc("catalog.xml")//number)

might pass the static analysis phase if number is untyped; the processor has no way of knowing whether all the contents of the number elements will be numeric values. However, it will raise a dynamic error in the evaluation phase if any of the number elements contains a value that cannot be cast to a numeric type, such as the string abc.

Automatic Type Conversions

In XQuery, each function and operator expects its arguments to be of a particular type. However, this is not as rigid as it may sound because there are a number of type conversions that happen automatically. They are discussed in this section.

Subtype Substitution

Functions and operators that expect a value of a particular type also accept a value of one of its derived types. This is known as subtype substitution. For example, the upper-case function expects an xs:string as an argument, but you can pass a value whose type is derived by restriction from xs:string, such as xs:NMTOKEN. This also works for complex types defined in schemas. A function expecting an element of type ProductType also accepts an element of type UmbrellaType, if UmbrellaType is derived by restriction from ProductType. Note that the value retains its original type; it is not actually cast to another type.

Type Promotion

When two values of different numeric types are compared or used in the same operation, one is promoted to the type of the other. An xs:decimal value can be promoted to the xs:float or xs:double type, and an xs:float value can be promoted to the xs:double type. For example, the expression 1.0 + 1.2E0 adds an xs:decimal value (1.0) to an xs:double value. The xs:decimal value is promoted to xs:double before the expression is evaluated. Numeric type promotion happens automatically in arithmetic expressions, comparison expressions, and function calls.

In addition, values of type xs:anyURI are automatically promoted to xs:string in comparison expressions and function calls. Unlike subtype substitution, type promotion results in the type of a value changing.

Casting of Untyped Values

In some cases, an untyped value is automatically cast to a specific type. This occurs in function calls, as well as in comparison and arithmetic expressions. For example, if you call the upper-case function with an untyped value, it is automatically cast to xs:string. If you add an untyped value to a number, as in <a>3</a> + 2, the untyped value 3 is cast to xs:integer, and the expression returns 5.

Note that typed values are not automatically cast. For example, "3" + 2 will not automatically cast the string 3 to the number 3, even though this is theoretically possible. One exception is the concat function, which automatically casts its arguments to strings. But that’s special behavior of this particular function, not something that happens implicitly on the function call.

Atomization

Atomization occurs when a function or operator expects an atomic value and receives a node instead. Specifically, it is used in:

  • Arithmetic operations

  • Comparisons

  • Function calls and returns

  • Cast expressions and constructors

  • Name expressions in computed constructors

  • Switch expressions

Atomization involves extracting the typed value of one or more elements or attributes to return one or more atomic values. For example:

<e1>3</e1> + 5

returns the value 8 because the value 3 is extracted from the e1 element during atomization. Also:

substring(<e2>query</e2>, 2, 3)

returns uer because the string query is extracted from the e2 element. These two examples work if e1 and e2 are untyped, because their so-called typed values would be instances of xs:untypedAtomic, and would be cast to the type required by the operation. They would work equally well if e1 had the type annotation xs:integer, and e2 had the type annotation xs:string, in which case no casting would need to take place.

Effective Boolean Value

It is often useful to treat a sequence as a Boolean value. For example, if you want to determine whether your catalog element contains any products whose price is less than 20, you might use the expression:

if (doc("prices.xml")//prod[price < 20])
then <bargain-bin>...</bargain-bin>
else ()

In this case, the result of the path expression doc("prices.xml")//prod[price < 20] is a sequence of elements that match the criteria. However, the test expression (after if) simply needs a true/false answer regarding whether there are any elements that match the criteria. Here, the sequence is automatically converted to its effective boolean value, which essentially indicates whether it is empty.

Sequences are automatically interpreted as Boolean values in:

  • Conditional (if-then-else) expressions

  • Logical (and/or) expressions

  • where clauses of FLWORs

  • Quantified (some/every) expressions

  • The argument to the not function

  • The predicates of path expressions

In addition, the boolean function can be used to explicitly convert a sequence to its effective boolean value. The effective boolean value of a sequence is false if it is:

  • The empty sequence

  • A single, atomic value of type xs:boolean that is equal to false

  • A single, atomic value of type xs:string that is a zero-length string ("")

  • A single, atomic value with a numeric type that is equal to 0 or NaN

The effective boolean value cannot be determined on a sequence of more than one item whose first item is an atomic value, and on individual atomic values whose type is not numeric, untyped, xs:boolean, or xs:string. It is also not defined for function items, including maps and arrays. If the processor attempts to evaluate the effective boolean value in these cases, error FORG0006 is raised.

In all other cases, the effective boolean value is true. This includes a sequence of one or more items whose first item is a node or a single atomic value other than those described in the preceding list. Table 11-1 shows some examples.

Table 11-1. Examples of effective boolean value
ExampleEffective boolean value
() false
false() false
true() true
"" false
"false" true
"x" true
0 false
xs:float("NaN") false
(false() or false()) false
doc("prices.xml")/* true
<a>false</a> true
<a>{xs:boolean("false")}</a> true
(false(), false(), false()) Error FORG0006
1, 2, 3 Error FORG0006
xs:date("2015-01-15") Error FORG0006
[true()] Error FORG0006
data( [true()] ) true

Note that a node that contains a false atomic value is not the same thing as a false atomic value by itself. In the <a>false</a> example in Table 11-1, the effective boolean value is true because a is an element node, not an atomic value of type xs:boolean. This is true even if the a element is declared to be of type xs:boolean.

Function Conversion Rules

When you call a function, sometimes the type of an argument differs from the type specified in the function signature. For example, you can pass an xs:integer to a function that expects an xs:decimal. Alternatively, you can pass an element that contains a string to a function that expects just a string. XQuery defines rules, known as function conversion rules, for converting arguments to the expected type. These function conversion rules apply only if the function expects an atomic value (or sequence of atomic values).

In fact, these function conversion rules use the various methods of type conversion and matching that are described in the preceding sections. They are put together here to show the sequential process that takes place for each argument when a function is called.

  1. Atomization is performed on the argument sequence, resulting in a sequence of atomic values.

  2. Casting of untyped values is performed. For example, the untyped value 12 can be cast to xs:integer. As noted above, typed values are not cast to other types.

  3. If the expected type is numeric or xs:string, type promotion may be performed. This means that a value of type xs:decimal can be promoted to xs:float, and xs:float can be promoted to xs:double. A value of type xs:anyURI can be promoted to xs:string.

Note that these rules do not cover converting a value to the base type from which its type is derived. For example, if an xs:unsignedInt value is passed to a function that expects an xs:integer, the value is not converted to xs:integer. However, subtype substitution does occur, and the function accepts this value.

The reverse is not true; you cannot pass an xs:integer value to a function that expects an xs:unsignedInt, even if the integer you pass meets all the tests for an xs:unsignedInt. The value must be explicitly cast to xs:unsignedInt.

As an example of the function conversion rules, if a function expects an argument of type xs:decimal?, it accepts any of the following:

  • An atomic value of type xs:decimal

  • The empty sequence, because the occurrence indicator (?) allows for it

  • An atomic value of type prod:myDecimal (derived from xs:decimal) because the sequence type xs:decimal? matches derived types as well

  • An atomic value of type xs:integer (derived from xs:decimal) because the sequence type xs:decimal? matches derived types as well

  • An atomic value of type prod:myInteger (derived from xs:integer) because the sequence type xs:decimal? matches derived types as well

  • An untyped atomic value, whose value is 12.5, because it is cast to xs:decimal (step 2)

  • An element of type xs:decimal, because its value is extracted (step 1)

  • An untyped attribute, whose value is 12, because its value is extracted (step 1) and cast to xs:decimal (step 2)

  • An untyped element whose only content is 12.5, because its value is extracted (step 1) and cast to xs:decimal (step 2)

A function expecting xs:decimal* accepts a sequence of any combination of the above items. On the other hand, a function expecting xs:decimal? does not accept:

  • An atomic value of type xs:string, even if its value is 12.5. This value must be explicitly cast to xs:decimal or type error XPTY0004 is raised.

  • An atomic value of type xs:float, because type promotion only works in one direction.

  • An untyped element whose only content is abc, because its value cannot be cast to xs:decimal.

  • An untyped element with no content, because its value "" (not the empty sequence) cannot be cast to xs:decimal.

  • A typed element whose type allows element-only content even if it has no children, because step 1 raises an error.

  • A sequence of multiple xs:decimal values; only one item is allowed.

Sequence Types

A sequence type is used in a query to specify the expected type of a sequence of zero, one, or more items. When declaring functions, sequence types are used to specify the types of the parameters as well as the return value. For example, the function declaration:

declare function local:getProdNums ($catalog as element()) as xs:integer*
  {$catalog/product/xs:integer(number)};

uses two sequence types:

  • element(), to specify that the $catalog parameter must be one (and only one) element

  • xs:integer*, to specify that the return type of the function is zero to many xs:integer values

Sequence types are also used in many type-related expressions, such as the cast as, treat as, and instance of expressions. The syntax of a sequence type is shown in Figure 11-2. The detailed syntax of some sequence types is diagrammed elsewhere, where the related component is described.

Figure 11-2. Syntax of a sequence type

Occurrence Indicators

An occurrence indicator can be used at the end of a sequence type to indicate how many items can be in a sequence. The occurrence indicators are:

  • ? For zero or one items

  • * For zero, one, or more items

  • + For one or more items

If no occurrence indicator is specified, it is assumed that the sequence can have one and only one item. For example, a sequence type of xs:integer matches one and only one atomic value of type xs:integer. A sequence type of xs:string* matches a sequence that is either the empty sequence or contains one or more atomic values of type xs:string. A sequence type of node()? matches either the empty sequence or a single node.

Remember that there is no difference between an item and a sequence that contains only that item. If a function expects xs:string* (a sequence of zero to many strings), it is perfectly acceptable to pass it a single string without attempting to enclose it in a sequence in any way.

The empty sequence, which is a sequence containing zero items, only matches sequence types that use the occurrence indicator ? or *, or empty-sequence().

Generic Sequence Types

Following are some generic sequence types:

item()

Matches any item (node, atomic value of any type, function, map, array)

node()

Matches a node of any kind

empty-sequence()

Matches the empty sequence

xs:anyAtomicType

Matches an atomic value

Table 11-2 shows some examples of the generic sequence types.

Table 11-2. Examples of generic sequence types
ExampleMeaning
node()* A sequence of one or more nodes, or the empty sequence
item()? One item of any kind, or the empty sequence
xs:anyAtomicType+ A sequence of one or more atomic values (of any type)

These generic sequence types are useful because it is not possible to specify, for example, “one or more xs:string values or nodes.” In this case, you would instead need to specify a more generic sequence type, namely item()+. They’re also useful when defining generic functions such as reverse or count.

Simple Type Names as Sequence Types

The sequence type can also be the qualified name of specific built-in atomic type, such as xs:integer, xs:double, xs:date, or xs:string. This matches atomic values of that type or any type derived (directly or indirectly) from it. For example, the sequence type xs:integer also matches an atomic value of type xs:unsignedInt, because xs:unsignedInt is indirectly derived by restriction from xs:integer in the type hierarchy. The reverse is not true; the sequence type xs:unsignedInt does not match an xs:integer value; it must be explicitly cast.

These sequence types match atomic values only, not nodes that contain atomic values of the specified type. However, in function calls, nodes can be passed to functions expecting these kinds of atomic sequence types, because of atomization. An element that contains an integer would match element(*, xs:integer) (described in the next section), but would also be acceptable as an argument being passed to a function expecting an xs:integer, for example. Table 11-3 shows some examples.

Table 11-3. Examples of sequence types based on type name
ExampleMeaning
xs:integer One atomic value of type xs:integer (or any type derived by restriction from xs:integer)
xs:integer? One atomic value of type xs:integer (or any type derived by restriction from xs:integer), or the empty sequence
prod:NameType* A sequence of one or more atomic values of type prod:NameType, or the empty sequence

List type names cannot be used in sequence types, but it is possible to specify their item type with an occurrence indicator. For example, instead of trying to specify xs:NMTOKENS, which is an illegal sequence type, you could specify xs:NMTOKEN*. A union type name such as xs:numeric can be used in a sequence type, as long as the union type has no list types among its members.

User-defined types such as prod:SizeType can be used in sequence type expressions, but they must have been imported from a schema.

Element and Attribute Tests

The sequence types element() and attribute() can be used to match any one element or attribute (respectively). An alternate syntax, with the same meaning, uses an asterisk, as in element(*) and attribute(*).

It is also possible to test for a specific name. For example, the sequence type:

element(prod:product)

matches any element whose name is prod:product.

When schemas are used, it is also possible to test elements and attributes based on their type annotations in addition to their names. This is described in “Sequence Types and Schemas”.

Sequence types can be used to test for other node kinds, using document-node(), text(), comment(), and processing-instruction(). These sequence types are discussed in Chapter 22.

Sequence types can be used to test for function items, including maps and arrays. These sequence types are discussed in “Functions and Sequence Types”, “Maps and Sequence Types”, and “Arrays and Sequence Types”, respectively.

Sequence Type Matching

Sequence type matching is the process of determining whether a sequence of zero or more items matches a specified sequence type, according to the rules specified in the preceding sections. Several kinds of expressions perform sequence type matching, such as the instance of expression described in this section.

Additional static-typing-related expressions, described in Chapter 15, also use the rules for sequence type matching. The typeswitch expression uses sequence type matching to control which expressions are evaluated. Other expressions, namely FLWOR expressions and quantified expressions, allow a sequence type to be specified to test whether values bound to variables match a particular sequence type.

The instance of Expression

To determine whether a sequence of one or more items matches a particular sequence type, you can use an instance of expression, whose syntax is shown in Figure 11-3.

Figure 11-3. Syntax of an instance of expression

The instance of expression does not cast a value to the specified sequence type. It simply returns true or false, indicating whether the value matches that sequence type. Table 11-4 shows some examples of the instance of expression.

Table 11-4. Examples of instance of expressions
ExampleReturn value
3 instance of xs:integer true
3 instance of xs:decimal true, because xs:integer is derived by restriction from xs:decimal
<x>{3}</x> instance of xs:integer false, because the element node x is untyped, even though it happens to contain an integer
<x>{3}</x> instance of element() true
<x>{3}</x> instance of node() true
<x>{3}</x> instance of item() true
(3, 4, 5) instance of xs:integer false
(3, 4, 5) instance of xs:integer* true
xs:float(3) instance of xs:double false

Sequence type matching does not include numeric type promotion. For this reason, the last example in the table returns false.

Constructors and Casting

There are two mechanisms in XQuery for explicitly changing values from one type to another: constructors and casting.

Constructors

Constructors are functions used to construct atomic values with given types. For example, the constructor xs:date("2015-05-03") constructs an atomic value whose type is xs:date. The signature of this xs:date constructor function is:

xs:date($arg as xs:anyAtomicType?) as xs:date?

There is a constructor function for each of the built-in simple types (both primitive and derived). The qualified name of the constructor is the same as the qualified name of the type. For the built-in types, constructor names are prefixed with xs to indicate that they are in the XML Schema namespace.

All the constructor functions have a similar signature, in that they accept an atomic value and return an atomic value of the appropriate type. Because function arguments are atomized, you can pass a node to a constructor function, and its typed value is extracted. If you pass an empty sequence to a constructor, the result will be the empty sequence.

Unlike most other functions, constructor functions will accept arguments of any type and attempt to cast them to the appropriate type. The argument value must have a type that can be cast to the new type; otherwise, type error XPTY0004 is raised. Values of almost all types can be cast to and from xs:string and xs:untypedAtomic. The specific rules for casting among types are described in “Casting Rules”.

In addition, the value must also be valid for the new type, or error FORG0001 is raised. For example, although the rules allow you to cast an xs:string value to xs:date, the expression xs:date("2015-13-02") raises an error because the month 13 is invalid.

For list types, constructor functions only accept a single value, but may return multiple atomic values, each an instance of the item type. The value passed to the constructor function must be a string or untyped value that is a space-separated list of values. For example, the expression xs:NMTOKENS("a b c") will return a sequence of three xs:NMTOKEN values.

Constructors also exist for all named user-defined simple types that are in the in-scope schema definitions. If, in a schema, you have defined a type prod:SizeType that is derived from xs:integer by setting minInclusive to 0 and maxInclusive to 24, you can construct a value of this type using, for example:

prod:SizeType("10")

The qualified names must match, so the prefix prod must be bound to the target namespace of the schema containing the SizeType definition. If the type name is in no namespace (the schema in which it is defined has no target namespace), you cannot use a constructor (unless you change the default function namespace, which is not recommended). You must use a cast expression instead.

The Cast Expression

Casting is the process of changing a value from one type to another. The cast expression can be used to cast a value to another type. It has the same meaning as the constructor expression; it is simply a different syntax. The only difference is that it can be used with a type name that is in no namespace. For example:

$myNum cast as xs:integer

casts the value of $myNum to the type xs:integer. It is equivalent to xs:integer($myNum). The syntax of a cast expression is shown in Figure 11-4.

Figure 11-4. Syntax of a cast expression

The cast expression consists of the expression to be cast, known as the input expression, followed by the keywords cast as, followed by the qualified name of the target type. Only a named simple type can be specified: either a built-in type or a user-defined simple type whose definition is among the in-scope schema definitions.

The type name may optionally be followed by a question mark as an occurrence indicator. In this case, the cast expression evaluates to the empty sequence if the input expression evaluates to the empty sequence. If no question mark is used, the input expression cannot evaluate to the empty sequence, or type error XPTY0004 is raised. This is in contrast to constructors, which always allow the empty sequence.

You cannot use the other occurrence indicators + and * because you cannot cast a sequence of more than one item using a cast expression. If you attempt to do this, type error XPTY0004 is raised. To cast more than one value, you could place your cast expression as the last step of a path, as in:

doc("catalog.xml")//number/(. cast as xs:string)

The input expression can evaluate to a single atomic value, or a single node, in which case it is atomized to retrieve its typed value. As with constructors, the value must have a type that allows casting to the target type, and it must also be a valid value of the target type.

The Castable Expression

The castable expression is used to determine whether a value can be cast to another specified simple type. It is sometimes useful to determine this before the cast takes place to avoid dynamic errors, or to determine how the expression should be processed. For example:

if ($myNum castable as xs:integer)
then $myNum cast as xs:integer
else ()

evaluates to $myNum cast to xs:integer if that is valid, otherwise the empty sequence. If the castable expression had not been used to test this, and $myNum was not castable as an xs:integer, error XPTY0004 would have been raised. The syntax of a castable expression is shown in Figure 11-5.

Figure 11-5. Syntax of a castable expression

The castable expression consists of an expression, followed by the keywords castable as, followed by the qualified name of the target type. It evaluates to a Boolean value. As with the cast expression, you can use the question mark as an occurrence indicator. The castable expression determines not only whether the one type can be cast to the other type, but also whether that specific value is valid for that type.

Casting Rules

This section describes the rules for casting atomic values between specific types. These rules are used in cast expressions and constructors. In this section, the source type refers to the type of the original value that is being cast, and the target type refers to the type to which the value is being cast.

Casting among the primitive types

Specific rules exist for casting between each combination of two primitive types. These rules are discussed, along with the types themselves, in Appendix B. The rules can be summarized as follows:

  • Values of any simple type can be cast to and from xs:string and xs:untypedAtomic if the value is valid for the target type. See the next two sections for more information.

  • A value of a numeric type can be cast to any other numeric type if the value is in the value space of the target type.

  • A value of a date, time, or duration type can sometimes be cast to another date, time, or duration type.

  • Other types (xs:boolean, xs:QName, xs:NOTATION, xs:anyURI, xs:hexBinary, and xs:base64Binary) have limited casting ability to and from types other than xs:string and xs:untypedAtomic. See Appendix B for more information on each type.

Casting from xs:string or xs:untypedAtomic

A value of type xs:string or xs:untypedAtomic can be cast to any other primitive type. For example, xs:integer("12") casts the xs:string value 12 to xs:integer. Of course, the string must represent a valid lexical representation of the target type. For example, xs:integer("12.1") raises error FORG0001 because the lexical representation of xs:integer does not allow fractional parts.

When a value is cast from xs:string to another primitive type, whitespace is collapsed. Specifically, this means that every tab, carriage return, and line-feed character is replaced by a single space; consecutive spaces are collapsed into one space; and leading and trailing spaces are removed. Therefore, xs:integer(" 12 ") is valid, even with the leading and trailing whitespace.

Casting to xs:string or xs:untypedAtomic

An atomic value of any type can be cast to xs:string or to xs:untypedAtomic. Some types have special rules about how their values are cast to xs:string. For example, integers have their leading zeros stripped. The rules (if any) for each type are described in Appendix B. Table 11-5 shows some examples of casting to xs:string and xs:untypedAtomic.

Table 11-5. Examples of casting to xs:string and xs:untypedAtomic
ExampleReturn value
xs:string("012") "012"
xs:string(012) "12"
xs:string(xs:float(12.3E2)) "1230"
xs:untypedAtomic(xs:float(12)) 12 (of type xs:untypedAtomic)
xs:string(true()) "true"

Casting among derived types

Now that you have seen casting among the primitive types, let’s look at derived types. There are three different cases.

The first case is that the source type is derived by restriction from the target type. In this case, the cast always succeeds because the source type is a subset of the target type. For example, an xs:byte value can always be cast to xs:integer.

The second case is that the source type and the target type are derived by restriction from the same primitive type. In this case, the cast succeeds as long as the value is in the value space of the target type. For example, xs:unsignedInt("60") can be cast to xs:byte, but xs:unsignedInt("6000") cannot, because 6000 is too large for xs:byte. This case also applies when the target type is derived by restriction from the source type. For example, xs:integer("25") can be cast to xs:unsignedInt, which is derived from it.

The third case is that the source type and the target type are derived by restriction from different primitive types—for example, if you want to cast a value of xs:unsignedInt to prod:myFloat, which is derived by restriction from xs:float. In this case, the casting process has three steps:

  1. The value is cast to the primitive type from which it is derived, e.g., from xs:unsignedInt to xs:decimal.

  2. The value is cast from that primitive type to the primitive type from which the target type is derived, e.g., from xs:decimal to xs:float.

  3. The value is cast from that primitive type to the target type, e.g., from xs:float to prod:myFloat.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.234.225