Chapter 5. Basic Expressions

Introduction

XQuery is a large language, with many different kinds of expressions. Some of these expressions are so basic that they will appear in almost every query you will write.

This chapter focuses on the most fundamental XQuery operators and functions, beginning with comparisons and sequence manipulation, proceeding to arithmetic, logic, and finally touching on the details of the query prolog not already described in previous chapters.

Even if the XML data you are working with is untyped, the XQuery expressions you write are typed. To illustrate the interactions of types and expressions, we'll use the typed XML fragment shown in Listing 5.1 as the input sequence in the examples throughout this chapter.

Each element in this fragment has the atomic type indicated; the <untyped> element has type xdt:untypedAtomic. Note that there are two string-typed elements, for contrast.

Example 5.1. The types.xml document

<root>
  <integer xsi:type="xs:integer">12</integer>
  <decimal xsi:type="xs:decimal">3.45</decimal>
  <float   xsi:type="xs:float"  >67.8</float>
  <double  xsi:type="xs:double" >0.9</double>
  <string  xsi:type="xs:string" >12</string>
  <string  xsi:type="xs:string" >x</string>
  <untyped>012</untyped>
</root>

Comparisons

Comparisons appear in almost every XQuery expression, whether testing for equality, determining the larger of two values, filtering a sequence, or joining two data sources (to name just a few uses of comparisons).

XPath 1.0 defined six comparison operators: >, <, >=, <=, =, and !=, known in XQuery as general comparisons. XQuery keeps these (with some modifications) and adds nine new comparison operators: the six value comparisons gt, lt, ge, le, eq, and ne; the node comparison is; and the two order comparisons << (“before”) and >> (“after”).

XQuery also defines two built-in comparison functions, listed in Table 5.1 and described briefly later in this section (see Appendix C for more information).

Table 5.1. XQuery built-in comparison functions

Function

Meaning

compare

Compare two atomic values

deep-equal

Compare two entire sequences, using deep equality on nodes

The reason there are so many different comparison operators is the same reason there are so many different XML data models. Some people want lexical equality (in which two values are equal if and only if their string representations are identical), while others want value equality (in which the lexical representation is disregarded, and only the underlying typed value matters). Some people want to compare nodes based on their content, others on their node identity, while others still want complete structural comparisons (aka deep equality). XQuery handles all of these cases.

Value Comparisons

The six value comparisons are binary operators that test whether the left operand is equal to (eq), not equal to (ne), greater than (gt), greater than or equal to (ge), less than (lt), or less than or equal to (le) the right operand. These operators first apply some implicit type conversions to their operands, and then return a single xs:boolean value corresponding to whether the comparison is true or false. Listing 5.2 demonstrates all of them.

Example 5.2. The six value comparison operators

2 le 1  => false
2 lt 1  => false
2 ge 1  => true
2 gt 1  => true
2 ne 1  => true
2 eq 1  => false

First, both operands are atomized (as described in Chapter 2), turning them into sequences of atomic values (if they weren't already). If after atomization either operand isn't a single value, then an error is raised. Consequently, both of the expressions in Listing 5.3 result in errors.

Example 5.3. Value comparisons operate only on singletons

() eq 0      => error
(0, 1) eq 0  => error

Numeric type promotion and subtype substitution are applied (also described in Chapter 2), meaning that numeric types are promoted to a common type, and when one operand is a subtype of the other, it is promoted to that type. To support lexical comparisons, untyped operands (that is, operands typed as xdt:untypedAtomic) are cast to xs:string and thus compared using their string values. If after all this the values have different types, then an error is raised. Otherwise, they are compared using the rules for that type, resulting in either true or false.

The examples in Listing 5.4 illustrate value comparisons on simple atomic constants. The ones in Listing 5.5 use the XML fragment introduced at the beginning of this chapter to select typed nodes and compare them.

Example 5.4. The effects of type conversions on value comparisons

2 gt 1               => true  (: compared as xs:integer :)
2 gt 1.0             => true  (: compared as xs:decimal :)
2 eq 1E0             => true  (: compared as xs:double :)
2 eq "1"             => error (: incompatible types :)

Example 5.5. Value comparisons used on nodes

string eq integer    => error (: two string values :)
untyped eq integer   => true  (: compared as xs:double :)
string[1] eq untyped => false (: compared as xs:string :)
string[1] eq integer => error (: incompatible types :)

Every type supports comparison using eq and ne. However, only the following totally ordered types (and subtypes of them) support the other value comparison operators: xs:boolean, xs:string, xs:date, xs:time, xs:dateTime, xdt:yearMonthDuration, and xdt:dayTimeDuration, and the numeric types (xs:integer, xs:decimal, xs:float, and xs:double). All other types result in an error when used with gt, ge, lt, or le. Listing 5.6 demonstrates a few value comparisons.

Example 5.6. Value comparisons work on almost all types

"a" eq "b"        => false  (: depends on default collation :)
0E0 eq -0E0       => true
true() gt false() => true

String values are compared using the default collation. Usually this is the Unicode code point collation, meaning two strings compare as equal if and only if they contain exactly the same sequence of Unicode code points. However, most implementations support other collations (see Section 5.6). String comparison is the same as the one performed by the built-in compare() function (see Appendix A).

Values of type xs:boolean are ordered so that true compares greater than false. Binary values (xs:hexBinary and xs:base64Binary) and values of type xs:anyURI and xs:NOTATION compare as equal if they have the same length and code points, irrespective of collation.

Finally, two xs:QName values are compared by comparing their local name and namespace parts separately (using code points). The values are equal if their local names are equal and both lack namespaces, or if their local names are equal and their namespaces are equal.

General Comparisons

Value comparisons are fine when you have two singleton values to compare, but what about comparing sequences of values? The answer depends on the kind of comparison you want to perform.

The general comparison operators support the most common case, existential sequence comparisons. That is, each general comparison operator tests whether there exists an item in the left operand sequence and independently there exists an item in the right operand sequence such that the two items compare true. (Chapter 6 shows how to perform other kind of sequence comparisons, such as memberwise.) Listing 5.7 shows all of the general comparison operators in use.

Example 5.7. The six general comparison operators

2 <= 1  => false
2 < 1   => false
2 >= 1  => true
2 > 1   => true
2 != 1  => true
2 = 1   => false

Like the value comparison operators, the general comparisons first atomize both operands. However, the general comparison operators work on sequences with any number of members, apply slightly different type conversions, and differ in their handling of the xdt:untypedAtomic type. For example, none of the comparisons in Listing 5.8 result in errors, in contrast to the analogous expressions using value comparisons.

Example 5.8. General comparisons work on sequences

0 = 0       => true
() = 0      => false
(0, 1) = 0  => true

Finally, the items are compared using the corresponding value comparison operator (differing only in how they handle untyped data, explained next). The six general comparison operators =, !=, <, <=, >, and >= correspond to the six value comparison operators eq, ne, lt, le, gt, and ge. These operators apply numeric type promotion and subtype substitution as described in the previous section, and then compare the two values. The overall general comparison is true if the value comparison is true for any pair of items, otherwise it's false.

During this pairwise comparison, if either item being compared is untyped, then a special conversion takes place: If both operands are untyped (that is, typed as xdt:untypedAtomic), then both are cast to xs:string. If one is untyped and the other is numeric, then both are cast to xs:double. Otherwise, if one is untyped, then it is cast to the type of the other operand. In contrast, the value comparison operators by themselves always convert untyped values to xs:string. Listing 5.9 illustrates some of these differences, using the XML document in Listing 5.1 as the input sequence.

Example 5.9. The effects of type conversions on general comparisons

2 > 1               => true  (: compared as xs:integer :)
2 > 1.0             => true  (: compared as xs:decimal :)
2 > 1E0             => true  (: compared as xs:double :)
2 > "1"             => error (: incompatible types :)
string = integer    => error (: incompatible types :)
untyped = integer   => true  (: compared as xs:double :)
string = untyped    => false (: compared as xs:string :)

By now, you may have realized two unusual characteristics of the general comparison operators. One is that they don't test whether two sequences are exactly identical. The expression (1, 2) = (1, 3) results in true because there exists a member in the left sequence (1) and a member in the right sequence (1) such that they are equal, even though the sequences themselves are different. The expression (1, 2) != (1, 2) also results in true, because there exists a member in the left sequence (1) and a member in the right sequence (2) that are unequal, even though the sequences as a whole are the same. Listing 5.10 demonstrates this effect, and its interaction with not().

Example 5.10. General comparisons test existence, with surprising results

(1, 2) = (2, 3)            => true
(1, 2) = (1, 2)            => true
(1, 2) != (1, 2)           => true
not((1,2) = (1,2))         => false

The deep-equal() function (described in Section 5.2.4) and the iteration operators of Chapter 6 provide other ways to compare sequences, instead of testing existence.

Another unusual characteristic is that the general comparison operators are nondeterministic when errors are involved. For example, (1, "2") = 1 might raise a type error (because comparing xs:string with xs:integer isn't allowed), or it might return true (because 1 equals 1). The answer depends on the order in which the implementation iterates through the sequences, which may vary from one implementation to the next, and may vary even within a single implementation during the execution of a query. For more surprises, see Chapter 11.

Node Comparisons

Nodes can be compared in even more ways than values. Because the value and general comparison operators atomize their operands, they compare the atomic values of nodes. However, you may also want to compare nodes by their identity, by their order within a document, or by their names and structure instead of just simple values (so-called “deep equality”). For these purposes, XQuery provides the node comparison operator is, the order comparison operators << (“before”) and >> (“after”), and a built-in function, deep-equal(), respectively.

Both the node and order comparisons require that their operands be either single nodes or the empty sequence, otherwise an error is raised. If either operand is the empty sequence, then the comparison returns the empty sequence. Otherwise, it returns the boolean result of the comparison (true or false)

The node comparison operator is returns true if the two nodes are the same node (by identity), otherwise it returns false. Its interactions with constructed nodes, variables, and the doc() function (recall it always returns the same node for the same string) are shown in Listing 5.11.

Example 5.11. The node comparison operator is

<x/> is <x/>                       => false

let $a := <x/> return $a is $a     => true

doc("team.xml") is doc("team.xml") => true

The order comparison operator << returns true if the left node appears before the right node in document order, otherwise it returns false. The >> operator returns the opposite—true if the left node appears after the right node in document order. Remember, nodes from different documents have an implementation-dependent ordering (although that ordering doesn't change during the execution of a query). Listing 5.12 demonstrates the before and after operators.

Example 5.12. The two order comparison operators

let $a := <x><y/></x> return $a << $a/y   => true
let $a := <x><y/></x> return $a >> $a/y   => false

Node identity and order comparisons are most commonly used when navigating an existing XML document. For example, you may want to test whether two paths select the same node, or whether the node selected by one path appears before the node selected by another path.

When using paths with these operators, keep in mind that both operands need to select at most one node. If a path may select multiple nodes, use one of the operators from Chapter 6 to iterate through the sequence and compare each member individually, as shown in Listing 5.13.

Example 5.13. Node comparisons are most commonly used with navigation

for $oldtimer in doc("team.xml")//Employee[@years > 7]
for $lead in doc("team.xml")//Employee[contains(@title, "Lead")]
where $oldtimer is $lead
return $oldtimer

Sequence and Tree Comparisons

Although the general comparisons can compare members of sequences (using existence) and the node and order comparisons can compare nodes (using identity), sometimes you want to compare an entire sequence or an entire XML tree at once. The deep-equal() built-in function performs both of these tasks (see Listing 5.14).

Example 5.14. The deep-equal() function compares entire sequences and trees

(1, 2) = (2, 3)            => true
deep-equal((1, 2), (2, 3)) => false
deep-equal((1, 2), (1, 2)) => true
<x/> is <x/>               => false
deep-equal(<x/>, <x/>)     => true
deep-equal(<a><b c="1"/></a>, <a><b c="2"/></a>) => false
deep-equal(<a><b c="1"/></a>, <a><b c="1"/></a>) => true

It takes two sequences as arguments, and an optional third argument specifying the collation to use when comparing string values (the default collation is used otherwise). It then tests whether the two sequences are exactly equal. Nodes are compared as entire trees, not merely by identity or value.

It compares node kinds, names, and contents of the entire subtree defined by each node, ignoring all processing-instruction and comment nodes. (Attributes are treated as unordered, but otherwise ordering matters.) The schema types of nodes aren't compared, nor are certain other node properties such as base URI, so deep-equal() doesn't perform an exact comparison of the two data models.

Sequences

Now that we've explored the ways to compare values, nodes, and sequences, let's examine the other operators XQuery provides, starting with the sequence operators. XQuery provides several operators that are specifically dedicated to constructing, combining, and otherwise manipulating sequences.

Recall also that in XQuery, sequences contain only atomic values and nodes, and are never nested. Sequences are ordered using 1-based indices (so the first member appears at position 1). The empty sequence is denoted using empty parentheses ().

Unlike XPath 1.0, XQuery sequences may contain duplicate nodes (by identity). For example, the expression let $a := <a/> return ($a, $a) constructs a sequence containing the same element twice. This is different from the expression (<a/>, <a/>), which constructs a sequence containing two different elements—both of which happen to have the same name and content.

XQuery defines five main operators for working with sequences: concatenation (the comma operator, ), union (the union keyword or the vertical bar | ), intersection (intersect), difference (except), and range (to).

XQuery also defines a plethora of built-in functions dedicated to sequence manipulation, as shown in Table 5.2. Most of these are highlighted in this section, all of them are completely covered in Appendix C.

Table 5.2. XQuery built-in sequence functions

Function

Meaning

count

The length of the sequence

distinct-values

Remove all duplicate values

empty

True if the sequence is empty

exists

True if the sequence is non-empty

index-of

Find an item in the sequence

insert-before

Insert items into a sequence

remove

Remove items from a sequence

reverse

Reverse a function

subsequence

Select a subsequence

unordered

Hint that ordering is unimportant

Constructing Sequences

XQuery uses the comma operator to concatenate sequences together. For example, given the two sequences (1, 2) and (1, 3), their concatenation ((1, 2), (1, 3)) is the (flattened) sequence (1, 2, 1, 3).

As this example shows, concatenation doesn't remove duplicates (by value or node identity). Duplicate values can be removed by applying the distinct-values() function. For example, distinct-values(((1, 2), (1, 3))) results in the sequence (1, 2, 3) or (2, 1, 3) (the order in which duplicates are removed is implementation-dependent). Distinct values are often useful when grouping; see Chapter 6 for examples.

Duplicate nodes (by node identity) can be removed by using the union operator to combine the lists. For example, let $a := <a/> return ($a, $a) results in a sequence containing the same node twice, but let $a := <a/> return $a union $a results in a sequence containing only one node. Like distinct-values(), the order in which duplicate nodes are removed is implementation-dependent. Because paths already eliminate duplicates and sort node sequences by document order, the same effect can be achieved using a path. For example, let $a := <a/> return ($a, $a)/. removes the duplicate node by application of the navigation step /..

When combining paths in a union, it is common to use the abbreviation | (vertical bar) instead of the keyword union. For example, doc("team.xml")//Employee/(Name | Title) selects all Name and Title child elements (in document order) from each Employee element.

Not only can XQuery combine sequences using set union and set concatenation, but it can also subtract them using set intersection (intersect) and set difference (except). Both of these operators accept only node sequences; an error is raised if either operand contains an atomic value.

The intersect operator computes all the nodes that appear in both of its arguments. (Duplicate nodes are removed from the result, if necessary.) For example, doc("team.xml")//Employee/.. intersect doc("team.xml")/Team finds the parent nodes of all Employee elements and intersects this set with the root Team element. The result is the root Team element (because it's the only node in both sets). It's somewhat uncommon to use intersect and except with paths; this example could have been expressed more efficiently as a predicate—doc("team.xml")/Team[Employee]— but on occasion these set operators can be useful.

The except operator computes the asymmetric difference of two sets. That is, the expression A except B results in all nodes that are in A but not in B (with duplicates removed). The symmetric difference—the nodes that are in one of A or B but not both—can be calculated either by taking the union of the two asymmetric differences or else computing the difference of their union with their intersection (see Listing 5.15).

Example 5.15. Two ways to compute the symmetric difference

(A except B) union (B except A)
(A union B) except (A intersect B)

Finally, XQuery provides an operator, to, that constructs a sequence of consecutive integers. It takes two integer operands and computes the sequence of all the integers between them (inclusive). When the first operand is greater than the second, the empty sequence is returned; when the two operands are equal, the sequence consists of that one integer. The to operator is demonstrated in Listing 5.16.

Example 5.16. The range operator is a great way to construct integer sequences

1 to 4    =>   (1, 2, 3, 4)
4 to 1    =>   ()
4 to 4    =>   (4)

Ranges are most useful when iterating, because implementations can optimize the expression to avoid buffering the entire sequence in memory.

The to operator ordinarily constructs sequences of consecutive integers, but it's easily adapted to use other increments. For example, the predicate [. mod 2] can be applied to keep all the odd numbers and [. mod 2 = 0] keeps all the even numbers, as shown in Listing 5.17.

Example 5.17. Ranges can be used to construct other kinds of integer sequences

(0 to 10)[. mod 2]      => (1, 3, 5, 7, 9)
(0 to 10)[. mod 2 = 0]  => (0, 2, 4, 6, 8)

Processing Sequences

As the previous section shows, predicates are a natural way to select members from a sequence by position. For example, $seq[1] selects the first item in the sequence, while $seq[last()] selects the last item in the sequence. If the index is out of bounds then the empty sequence is returned (not an error).

The reverse() function reverses the order of items in a sequence. A range of values can be selected either using comparisons in the predicate, or using the subsequence() function. For example, $seq[1 <= position() and position() <= 3] selects the first three items of the sequence; so does subsequence($seq, 1, 3). Listing 5.18 demonstrates the use of subsequence(), reverse(), and numeric predicates to filter sequences.

Example 5.18. Selecting items from a sequence

("a", "b"¸ "c")[1]                 => "a"
("a", "b", "c")[last()]            => "c"
("a", "b", "c")[-1]                => ()
reverse(("a", "b", "c"))           => ("c", "b", "a")
subsequence(("a", "b", "c"), 2)    => ("b", "c")
subsequence(("a", "b", "c"), 1, 2) => ("a", "b")

The built-in functions empty() and exists() can be used to test whether a sequence is empty or not, respectively. More generally, the count() function computes the length of a sequence, as shown in Listing 5.19.

Example 5.19. Computing or testing the sequence length

count(("a", "b", "c")) => 3
count(())              => 0
empty(())              => true
empty((1, 2))          => false
exists(())             => false
exists((1, 2))         => true

XQuery also provides functions for searching a sequence for an item (index-of()), constructing a new sequence from an existing one with an item inserted into it (insert-before()), and constructing a new sequence from an existing one by removing an item from it (remove()).

The index-of() function takes two arguments: the sequence and the item to search for (by value). Optionally, a third collation argument may be specified for string searches. It returns a list of all integer positions at which the item occurs, in order from least to greatest (empty if the item doesn't occur in the sequence).

Note that the index-of() function can be very inefficient on large sequences because it doesn't stop searching at the first occurrence. Some implementations may optimize index-of($seq, $value)[1] (the first index) and index-of($seq, $value)[last()] (the last index).

The insert-before() function takes three arguments: the original sequence, the insertion position, and a sequence of zero or items to be inserted. It performs the requested insertion (inserting before the insertion point). Similarly, the remove() function takes two arguments—the sequence and a position to exclude. It returns the sequence consisting of all items except the one at that position. Note that neither of these functions alters the original sequence. These three functions are demonstrated in Listing 5.20.

Example 5.20. Searching, inserting into, and removing from sequences

index-of((4, 5, 6, 4), 5)              => 2
index-of((4, 5, 6, 4), 7)              => ()
index-of((4, 5, 6, 4), 4)              => (1, 4)
insert-before((1, 2, 3, 4), 2, (5, 6)) => (1, 5, 6, 2, 3, 4)
insert-before((1, 2, 3, 4),-2, (5, 6)) => (5, 6, 1, 2, 3, 4)
insert-before((1, 2, 3, 4), 4, (5, 6)) => (1, 2, 3, 5, 6, 4)
insert-before((1, 2, 3, 4), 5, (5, 6)) => (1, 2, 3, 4, 5, 6)
remove((4, 5, 6, 4), 2)                => (4, 6, 4)
remove((4, 5, 6, 4), 4)                => (4, 5, 6)

The index-of() function can be paired with the position() function to remove all occurrences of a particular value from a sequence, as shown in Listing 5.21.

Example 5.21. Removing by value instead of by index

declare function remove-value($seq, $val as xdt:untypedAtomic) {
  $seq[position() != index-of($seq, $val)]
};

remove-value((4, 5, 6, 4), 4)   =>  (5, 6)

Finally, XQuery provides the unordered() function as an optimization hint. This function just tells an implementation that the sequence order is unimportant; for example, paths normally always sort by document order, but perhaps you don't need this. The effects are implementation-dependent.

Arithmetic

XQuery isn't a language designed for heavy-duty mathematics, but it does support eight of the most common arithmetic expressions: addition (+), subtraction (-), multiplication (*), floating-point division (div), integer division (idiv), modulus (mod), unary plus (+), and unary minus (-). In addition, XQuery provides nine built-in arithmetic functions, listed in Table 5.3. XQuery doesn't define more complex arithmetic operations such as trigonometry or logarithms, although these may be available in some XQuery implementations through extension functions.

Table 5.3. XQuery built-in arithmetic functions

Function

Meaning

floor

Round down (to negative infinity)

ceiling

Round up (to positive infinity)

abs

Compute the absolute value

min

Compute the minimum value

max

Compute the maximum value

avg

Compute the average value

sum

Compute the sum

round

Round to the closest integer, ties rounded up

round-half-to-even

Round to the closest integer, ties to nearest even number

The syntax used for the arithmetic operators can be tricky. Except for addition, all of them can be used in other contexts with different meanings unrelated to arithmetic: The division and modulo operators (div, idiv, mod) are valid names, and can be used in paths (no keywords are reserved). For example, the query div idiv mod selects the elements named div and mod, and then computes their integer division. Confusing!

The punctuation symbols are also a common source of confusion: The hyphen is a valid name character and must be separated by names with a space or else it will be parsed as part of the name. The plus symbol is used in some type expressions, and the multiplication operator is used in both type expressions and as a path wildcard. The examples in Table 5.4 illustrate their meanings in different contexts.

Table 5.4. Beware of the XQuery punctuation rules

Example

Meaning

a-1

The name a-1

a – 1 a

minus 1

a/b

The path a/b

a div b a

divided by b

a * b a

multiplied by b

div/* cast as div*

Child elements of div cast to the type div*

Except for idiv and unary plus, all of these operators were available in XPath 1.0, but behave somewhat differently in XQuery. In XPath 1.0, all arithmetic operations were carried out in double-precision floating-point arithmetic. In XQuery, the arithmetic rules reflect the many more numeric types that are available.

Like the comparison operators, the arithmetic operators first atomize both operands. If either operand is the empty sequence, then the expression results in the empty sequence (not an error). If either operand isn't a singleton, then an error is raised.

Otherwise, numeric type promotion is applied (as described in Chapter 2), and the two numbers are added (see Listing 5.22). The only other type that is allowed is untyped data (xdt:untypedAtomic). Operands that are untyped are promoted to xs:double for all operators except idiv, which promotes untyped values to xs:integer.

Example 5.22. Arithmetic operators atomize and perform numeric promotion

1 + 2              => 3
1 + 2.0            => 3.0
1 + 2E0            => 3E0
untyped - 2        => 1E1
string – 2         => error (: incompatible types :)
() * 0             => ()
() + (2, 0)        => ()
(1, 1) + (2, 0)    => error (: non-singleton operands :)

The behavior of the addition, subtraction, multiplication, and division operators should be mostly what you would expect. Except for the division of integers, which results in a decimal value, all of these result in a value with the same type as the operands after numeric type promotion has been applied.

The integer division operator idiv requires that both of its operands be integers, otherwise an error is raised. It then carries out truncating integer division (so 5 idiv 3 equals 1). The mod operator computes the modulus (the value that is left over from division). Neither of these operators is commonly used.

See Appendix B for rigorous definitions of all the arithmetic operators. Listing 5.23 provides examples of *, div, idiv, and mod.

Example 5.23. Multiplication, division, and modulus operators

1 * 2            => 2
1.0 * 2.0        => 2.0
1E0 * 2E0        => 2.0E0
float * float    => xs:float("4596.84")
3 div 1          => 3.0
1.0 div 2        => 0.5
1E0 div 2        => 5E-1
1.0 div 0        => error (: divide-by-0 error :)
1E0 div 0        => INF   (: double, float support division by 0 :)
3 mod 2          => 1
3 mod 1.5        => 0.0
3 mod 1.2        => 0.6
3 mod 2.2        => 3.0
3E0 mod 2        => 1E0
7 idiv 2         => 3
7 mod 2          => 1
7 div 2          => 3.5
7 idiv 2.0       => error (: idiv works only on integers :)

Arithmetic operators can overflow or underflow. For decimal values, implementations are required to report overflow and to return 0.0 on underflow. For all other types, implementations are allowed to choose between raising an error and allowing the overflow or underflow to occur (with various results—see Appendix B for details).

Note also that floating-point arithmetic is performed according to the IEEE 754 specification, including the special values positive and negative zero (0.0 and -0.0), positive and negative infinity (INF and -INF), and a (non-signaling) not-a-number value (NaN). Consequently, dividing an xs:double or xs:float value by zero isn't an error. Listing 5.24 shows some of these interactions.

Example 5.24. Special floating-point values

1E0 div 0          =>  INF
-1E0 div 0         => -INF
0E0 div 0          =>  NaN
0 div 0            =>  error (: integer division by zero :)
1 div (-1E0 div 0) => -0.0

Generally speaking, fixed-point arithmetic should perform exact arithmetic without loss of precision. In practice, xs:decimal is usually implemented with limited precision. Because fixed-point arithmetic isn't commonly supported in hardware, most XQuery implementations are forced to emulate it in software. Consequently, fixed-point arithmetic often suffers from relatively poor performance and inconsistent behavior across implementations when compared to floating-point arithmetic. You should use decimal arithmetic only when your applications require it.

Finally, XQuery provides a few arithmetic functions for computing aggregate statistics over sequences of numbers and rounding.

The max(), min(), sum(), and avg() functions compute the maximum, minimum, sum, and average of a sequence of numbers, respectively. The max() and min() functions actually work on values of any type; they accept an optional second argument specifying the collation to use when comparing string values. They can even be used on sequences containing different types of numbers (for example, max((1, 1.5))), but the details are somewhat complicated (see Appendix C).

Example 5.25. Aggregation functions

max((1, 2, 3))       => 3
min((1, 2, 3))       => 1
avg((1, 2, 3))       => 2.0
sum((1, 2, 3))       => 6
sum(("a"¸ "b", "c")) => error (: not numbers :) 

There are seven rounding modes in popular use today, and XQuery provides functions for four of them (Chapter 10 shows how to implement the other three). All four take any number, and return a number of the same type. The floor() function takes a number and returns the greatest integer less than it (rounding down toward negative infinity). The ceiling() function returns the least integer greater than its argument (rounding up toward positive infinity). The round() and round-half-to-even() functions round to the closest integer, but differ in how they handle ties (such as 0.5). The round() function always rounds ties up, while the round-half-to-even() function rounds ties to the nearest even number. The round-half-to-even() function also accepts an optional second argument, which specifies the precision at which to do the rounding (see Appendix C). Listing 5.26 illustrates the rounding functions.

Example 5.26. Rounding functions

floor(2.2) => 2.0
floor(2.5) => 2.0
floor(2.6) => 2.0
round(2.2) => 2.0
round(2.5) => 3.0
round(2.6) => 3.0
round-half-to-even(2.2) => 2.0
round-half-to-even(2.5) => 2.0
round-half-to-even(2.6) => 3.0
ceiling(2.2) => 3.0
ceiling(2.5) => 3.0
ceiling(2.6) => 3.0

XQuery doesn't provide any other arithmetic functions, although it's possible to construct your own (and implementations may also provide some). For example, Listing 5.27 defines an exponentiation function that raises its first argument to the power of its second argument.

Example 5.27. An exponentiation function

declare function pow($b as xs:integer,
                     $exp as xs:integer) as xs:integer {
  if ($exp < 1)
    then 1
  else
    $b * pow($b, $exp – 1)
};

pow(2, 10) => 1024
pow(2, 20) => 1048576

Logic

Like most languages, XQuery can express all the familiar concepts from boolean logic, including the two boolean constants (true, false), the boolean operators (and, or, not), and conditionals (if/then/else). (In fact, XQuery supports the full predicate calculus: conjunction, disjunction, conditions, and existential and universal quantification.)

The and and or operators are written as keywords, but for compatibility with XPath 1.0, the not() operator is written as a function. All three implicitly coerce their operands to the xs:boolean type by applying the Effective Boolean Value (defined in Chapter 2), as if by a call to the boolean() function. These operators are demonstrated in Listing 5.28.

Example 5.28. Logical operators apply the Effective Boolean Value

true() or false()   => true
1 or 0              => true
true() and false()  => false
1 and 0             => false
"1" and "0"         => true  (: boolean("0") = true :)
not(true())         => false
not(())             => true

The effects of EBV can be surprising; for example, not("false") results in false—because EBV looks at the string length, not its content, so "false" is converted to true, and then negated by not()—while not(xs:boolean("false")) results in true.

XQuery allows the and and or operators to short-circuit (on both sides), meaning that implementations are allowed to evaluate either operand first, and then if its value determines the result of the entire expression, the other operand doesn't need to be evaluated at all. For example, false() and error() can result in false or raise an error; so can error() and false(). Consequently, these operators have implementation-defined behavior in the face of errors.

In contrast, the if/then/else conditional statement always evaluates the if condition, and then if the condition is true it evaluates the then branch, otherwise it evaluates the else branch. The else branch isn't optional, although you can often return the empty sequence to achieve the same effect. Like the other boolean operators, the condition is converted to xs:boolean by applying the Effective Boolean Value.

As shown in Listing 5.29, any number of conditionals may be chained together, like so: ifthenelse ifthenelse ifthenelse …, in which the conditions are tested one after another until either one of them is found to be true, in which case that branch is taken, or else the else branch is evaluated. Exactly one branch is evaluated. The branches of a conditional need not have the same types.

Example 5.29. Conditional statements

if (exists(x/y/z)) then "yes" else "no"
if ($x = 'a') then 1 else if ($x = 'b') then 2 else 0

Query Prolog

Every XQuery module contains an optional query prolog. The query prolog may first declare a version, followed by certain declarations and imports in any order. After these come the user-defined functions, if any. Together, these set up the initial static context for the query. Every declaration and function definition must be followed by a semicolon (see Table 5.5).

Table 5.5. Declarations that may appear in a query prolog

Declaration

Meaning

xquery version "1.0";

Specify the XQuery version

declare xmlspace mode;

Set the default xmlspace policy

declare default collation "uri";

Set the default collation

declare base-uri "uri";

Declare the base-uri

declare namespace prefix = "uri";

Declare a namespace prefix

declare default element namespace "uri";

Set the default element namespace

declare default function namespace "uri";

Set the default function namespace

declare variable $name { expr };
declare variable $name as type { expr };

Define a global variable

declare variable $name external;
declare variable $name as type external;

Declare an external global variable

module prefix = "uri";

Declare this to be a library module and bind the namespace prefix

import module "uri";
import module "uri" at "hint";
import module namespace prefix = "uri";
import module namespace prefix =
"uri" at "hint";

Import a library module into this one and optionally declare a namespace prefix

import schema "uri";
import schema "uri" at "hint";
import schema namespace prefix "uri";
import schema namespace prefix "uri"
at "hint";
import schema default element
namespace "uri";
import schema default element
namespace "uri"
at "hint";

Import schema types and optionally declare a namespace prefix

declare validation mode;

Set the default validation mode

Chapter 4 described modules and user-defined functions. In this section, we focus on the other items that make up the query prolog.

Version Declaration

Optionally, a query may declare the XQuery version to which it has been authored; at this time, the only supported version is "1.0". This feature exists in anticipation of future XQuery versions (see Chapter 14), but for now it always takes the form shown in Listing 5.30.

Example 5.30. XQuery version

xquery version "1.0";

XML Space Declaration

The XML space declaration affects how whitespace in the query affects element construction (see Listing 5.31). The whitespace policy can also be changed later in the query by element constructors. See Chapter 7 for details.

Example 5.31. XML space declaration

declare xmlspace preserve;

The XML space declaration can have one of two values, strip or preserve, written as keywords, not strings. The default is strip. More than one XML space declaration in the prolog results in an error.

Base URI Declaration

The base URI declaration, shown in Listing 5.32, changes the base-uri. The base-uri is mainly used by the doc() function to resolve relative URIs.

Example 5.32. Base URI declaration

declare base-uri "http://www.awprofessional.com/";

Default Collation Declaration

The default collation declaration affects text processing, including string comparisons and sorting. The only collation that implementations are required to support is http://www.w3.org/2003/11/xpath-functions/collation/codepoint, which is also the default when no default collation declaration is used.

The default collation declaration takes a constant string value, which must be supported by the implementation. Listing 5.33 shows a hypothetical collation. More than one collation declaration results in an error.

Example 5.33. Default collation declaration

declare default collation
           "http://www.awprofessional.com/xquerycollations/en-us";

The default collation cannot be changed by any other expression, although most functions that are affected by collation accept an optional argument that explicitly specifies the collation to use. See Chapter 8 for an explanation of collations.

Namespace Declarations

The XQuery prolog may contain any number of namespace declarations. These introduce a prefix/namespace binding into scope for all the rest of the XQuery. As with collation declarations, each namespace value must be a constant string value. The namespace must not be the empty string, and the prefix must not begin with the characters “xml” (having any case, such as “XmL”). Listing 5.34 provides two examples.

Example 5.34. Namespace declarations

declare namespace foo = "urn:bar";
declare namespace awl = "http://www.awprofessional.com/";

Multiple declarations for the same prefix in the query prolog raise an error, although namespace declarations in an XML element constructor may override them without error (see Chapter 7).

The five namespace declarations listed in Table 5.6 are built in, and correspond to the XQuery Function, XQuery Data Types, XML, XML Schema, and XML Schema Instance namespaces, respectively. Whenever these prefixes occur in this book, they are bound to their default namespace values. You may override these namespace prefixes in the prolog if you wish, although this isn't recommended (unless you're purposely trying to intercept all XQuery built-in functions and/or data types).

Table 5.6. Built-in namespaces

Prefix

Namespace

fn

http://www.w3.org/2003/11/xpath-functions

xdt

http://www.w3.org/2003/11/xpath-datatypes

xml

http://www.w3.org/XML/1998/namespace

xs

http://www.w3.org/2001/XMLSchema

xsi

http://www.w3.org/2001/XMLSchema-instance

It is also possible to declare two namespaces to be used by default in unprefixed names. The default element namespace affects unprefixed element names in navigation and construction; the default function namespace affects unprefixed function names in function invocations (excluding the function name used in the declare function). Both are shown in Listing 5.35. An error results when there is more than one default element namespace or more than one default function namespace declaration in the query prolog.

Example 5.35. Default namespace declarations

declare default element namespace "http://your.org/";
declare default function namespace "http://your.org/";

If a default element namespace isn't declared in the prolog, then all unprefixed element names belong to no namespace. Otherwise, they belong to the declared default element namespace. Unprefixed function names used in function invocations use the default function namespace. If one isn't provided by the user, then it defaults to the built-in namespace, http://www.w3.org/2003/11/xpath-functions (which is also bound to the prefix fn).

Global Variable Declarations

The prolog may declare global variables and external parameters (if the implementation supports externals). Global variables have a value computed in the query itself; external parameters are declared in the query, but their values are completely unknown until runtime.

Both are declared using declare variable, followed by a variable name, optionally the static type of the variable, and then either the variable value enclosed in curly braces, or else the external keyword indicating that it is an external parameter, and finally the trailing semicolon. The examples in Listing 5.36 demonstrate the various kinds of variable declarations.

Example 5.36. Global variable declarations

declare variable $zero { 0 };
declare variable $decimalZero as xs:decimal { 0.0 };
declare variable $inf as xs:double { 1E0 div $zero };
declare variable $userName as xs:string external;
declare variable $userDoc { doc(concat($userName, ".xml")) };

The variable value must match the declared type. If no type is specified, then the type is either the static type of the value expression, or else xs:anyType if the variable is external.

Variable values may refer to other variables defined before them; they may also refer to functions imported from other modules (provided the import module statement occurs before the variable) or built-in functions. There is no way to assign a default value to external parameters (as in XSLT); if an external parameter value isn't supplied when executing the query, or if its value doesn't match the variable type, then an error is raised.

Module Imports and Declaration

As explained in Chapter 4, XQuery queries are organized into modules. A library module contains a module declaration and no query body; a main module doesn't contain a module declaration but may contain a query body. The module declaration appears in the query prolog, and consists of the keyword module followed by a string literal that is the target namespace for this module, as shown in Listing 5.37.

Modules are imported by target namespace into other modules (library or main) using an import module statement in the query prolog. This imports all the global variables and user-defined functions from that module into the current one. All types used in the function signatures and global variable types must be defined in the current module, or else an error is raised.

Example 5.37. Library modules contain a module declaration but no query body

module "urn:my-library";
declare variable $version { "1.0" };
declare function description() { "This is a library module" };

Example 5.38. Modules can be imported into other modules

import module namespace my = "urn:my-library";
if ($my:version = "1.0") then my:description()
else error("Library version mismatch")

The import module may optionally assign a prefix to the module namespace as in Listing 5.38, and may optionally suggest a module location to the implementation. See Appendix B for details.

Schema Imports and Validation Declaration

Finally, the query prolog may also import XML schema definitions into an XQuery, so that the types defined in the schemas are available to the query, and may also define a default validation mode.

The import schema statement is similar to the import module statement. It specifies the namespace that is imported, optionally declares a prefix for that namespace, and optionally suggests a location hint to the implementation. In addition, the import schema statement may assign the namespace to the default element or default function namespace. The example in Listing 5.39 imports a schema, and then declares a variable using one of the types from that schema.

Example 5.39. Schema types can be imported

import schema namespace my = "urn:my-types" at "mytypes.xsd"
declare variable $my:zipcode { "98052" cast as my:zip }

The query prolog may also override the default validation mode (which is lax) using the validation declaration shown in Listing 5.40. This declaration takes one of the keywords lax, skip, or strict, and sets that to be the default validation mode for the validate operator and element constructors. The validation mode can also be changed in the query during validation; see Chapter 9 for details.

Example 5.40. Sample validation mode declaration

declare validation strict;

Conclusion

XQuery, like all good query languages, is filled with a rich variety of expressions. These form the basic vocabulary out of which you can write many XQuery programs.

In this chapter, we first explored the operators that can compare items by value, node identity, and document order. We also examined the deep-equal() function, which compares sequences exactly (using deep structural equality on nodes). Next, we reviewed the functions and operators that are dedicated to sequence manipulation, including constructing and filtering sequences, as well as computing aggregate values such as the maximum or length of a sequence.

We also investigated the XQuery arithmetic functions and operators, which build on the numeric type support and type promotions described in Chapter 2. XQuery provides basic facilities for simple arithmetic calculations and functions for four rounding modes. Higher-order arithmetic operations such as exponentiation, logarithms, trigonometry, factorials, random numbers, or bitwise manipulation are not built in, although some implementations may provide them as extension functions.

From there, we progressed to the XQuery facilities for expressing logical conditions. XQuery supports two-valued Boolean logic, with its two constants (true(), false()) and the usual Boolean operators (and, or, not()). XQuery also supports conditional expressions using the if/then/else keywords, as well as other more complex operators, such as existential and universal quantification (which are described in the next chapter). And finally, we ended this chapter with an examination of the query prolog, which controls the context in which XQuery expressions are compiled and evaluated.

Further Reading

For more information about floating-point numbers, I highly recommend the article “What Every Computer Scientist Should Know About Floating Point Arithmetic” by David Goldberg in ACM Computing Surveys, 1991, which is also available online from http://citeseer.nj.nec.com/goldberg91what.html. IBM maintains a fantastic archive of information about decimal arithmetic online at http://www2.hursley.ibm.com/decimal/decimal.html, which also contains pointers to information about floating-point arithmetic and rounding modes.

For a dense but practical book on logic from the software engineering point of view, look no further than the book Logic in Computer Science: Modelling and reasoning about systems by Michael Huth and Mark Ryan, and its Web site, http://www.cs.bham.ac.uk/research/lics. If you're a professional software developer, this is probably the last logic book you will ever need.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.34.87