Chapter 11

XQuery 1.0 Definition

11.1 Introduction

After introducing you to XQuery in Chapter 10, and mentioning different aspects of the language in various places throughout this book so far, we’re ready to get into the details of the W3C’s XML Query Language, XQuery 1.0.

You already read something about the history of XQuery’s development in the W3C (in Chapter 10, for example) and the requirements that led to the language currently progressing through the W3C’s Recommendation process.1 In addition, you’ve seen in Chapter 10 the “big picture” view of the suite of documents that have been developed under the umbrella of XQuery.

Chapter 6, “The XML Information Set (Infoset) and Beyond,” introduced you to the XQuery 1.0 and XPath 2.0 Data Model (“XQuery Data Model,” or just “XDM” for brevity), and Chapter 10 provided more detail. Consequently, the Data Model is not addressed in any depth in this chapter.

The bulk of the chapter is spent on the details of the XQuery syntax and semantics, including the contexts in which XQuery exists and is executed; the formal semantics of the language, including the static typing facility; the rather large collection of functions and operators available to the language; and the mechanisms for transferring the results of an XQuery expression evaluation to the outside world (serialization).

We don’t expect that, when you finish this chapter, you’ll be an instant XQuery expert, but we do believe that you’ll be equipped to start experimenting with XQuery implementations and prototyping applications based on XQuery. In Appendix A: The Example, we have provided an extended example to show how XQuery and its companion specifications would be used in realistic situations.

11.2 Overview of XQuery

XQuery is, according to some observers, a large language. We do not completely agree with that observation, being very familiar with much larger languages (including Ada, COBOL, and SQL). Of course, we must acknowledge that there is a lot to absorb from the entire suite of XQuery-related documents. But we have found that, by taking in one document at a time, understanding the basic concepts specified in that document, it’s not difficult to get a good feel for the language as a whole.

To understand how XQuery works, you first need to understand the environment in which it works – its context and how it is processed. In the remainder of this section, we describe some important concepts, the contexts (both static and dynamic) of a query, and the processing model used to evaluate an XQuery expression.

11.2.1 Concepts

Every language has concepts that are necessary to an understanding of the language and how to use it. XQuery is no exception. Here are a few terms we consider especially important to know.

• Document order – This term is defined in the XQuery Data Model, but it is used in specifications as basic as the Infoset2 (about which you read in Chapter 6, “The XML Information Set (Infoset) and Beyond”). The term is in sufficiently wide use that we do not repeat its definition here.

• Sequence – The sequence is the most fundamental kind of value in the data model. A sequence is an ordered collection of zero or more items. A sequence that contains no items is called an empty sequence. A sequence containing one item is completely indistinguishable from that item by itself; it is called a singleton sequence. A consequence of this last provision is that every atomic value and every complex value is indistinguishable from a singleton sequence containing that value. Another consequence is that the Data Model does not support sequences of sequences.

    To illustrate these concepts, we use parentheses to enclose each sequence. Thus, (“Be afraid – be very afraid”) is a singleton sequence that is indistinguishable from the character string that it contains. Similarly, ( ) is an empty sequence, and (3.14159, 2.71828, 0.5772) is a sequence of three decimal numbers. A sequence like (1, 2, (3, 4), 5) cannot be in the Data Model, because the Data Model does not support sequences that contain other sequences; that sequence is “flattened” into the sequence (1, 2, 3, 4, 5).

• Atomization – Atomization is applied to a sequence when the sequence is used in a context in which a sequence of atomic values is required. The result of atomization is either a sequence of atomic values or a type error. Formally, atomization is the result of invoking the fn:data( ) function on the sequence. That result is the sequence of atomic values produced by applying the following rules to each item in the input sequence:

– If the item is an atomic value, it is returned.

– If the item is a node, its typed value is returned (an error is raised if the node has no typed value).

When atomization is applied to the sequence (3) – which, you read just above, is identical to the number 3 – the result is (3). Atomization applied to the sequence (87, “Four score and seven years ago”) results in the exact same sequence. Atomizing the sequence (<address>Gettysburg</address>, 42, <a>Today <b>is a day <b>which will live in infamy</a>) results in the sequence (“Gettysburg”, 42, “Today is a day which will live in infamy”).

• Effective Boolean value (EBV) – Formally, the effective Boolean value of an expression is the result of invoking the fn:boolean() function on the value of the expression. That result is a Boolean value produced by applying the following rules, in this order:

– If its operand is an empty sequence, fn:boolean( ) returns false.

– If its operand is a sequence whose first item is a node, fn:boolean( ) returns true.

– If its operand is a singleton value of type xs:Boolean or derived from xs:boolean, fn:boolean( ) returns the value of its operand unchanged.

– If its operand is a singleton value of type xs:string, xdt:untypedAtomic, or a type derived from one of these, fn:boolean () returns false if the operand value has zero length and true otherwise.

– If its operand is a singleton value of any numeric type or derived from a numeric type, fn:boolean () returns false if the operand value is NaN or is numerically equal to zero and true otherwise.

– In all other cases, fn:boolean( ) raises a type error.

• String value – Every node has a string value. The string value of a node is a string and, formally, is the result of applying fn:string () to the node. Less formally, the string value of a node is the concatenation of the string values of all of its child nodes, in document order (this includes both child element nodes and child text nodes). The string value of a node that doesn’t have any child nodes – a text node, for example – is simply the string representation of the value of that node.

• Typed value – Every node has a typed value. The typed value of a node is a sequence of atomic values and, formally, is the result of applying fn:data( ) to the node. Less formally, the typed value of a node is the result of converting the string value of the node into a value of the node’s type. For some node types (such as nodes whose type is xs:string, as well as comment or processing instruction nodes), the typed value is the same as the string value. For other node types (such as a node that has a type annotation indicating that its value is of type xs:decimal), the typed value is the result of converting the string value to that type (xs:decimal, in this case); if the conversion fails, then an error is raised by the fn:data( ) function.

11.3 The XQuery Processing Model

The XQuery Processing Model is a description of how an XQuery processor interacts with its environment and what steps it must take in order to evaluate a query. The XQuery 1.0 specification contains a very nice diagram to describe the Processing Model, which we have adapted in Figure 11-1.

image

Figure 11-1 XQuery Processing Model.

This Processing Model has several aspects worth noting.

• The Data Model instances can be created by parsing, and perhaps validating, XML documents, thereby creating Infosets or PSVIs. The Data Model spec describes how to derive a Data Model instance from an Infoset or a PSVI. They can also be created by other means, such as by programs that directly generate Data Model instances for XQuery engines to evaluate.

• Similarly, the In-Scope Schema Definitions can be created by parsing XML Schema documents, thereby generating XML Schemas. Alternatively, they can be created by other means, analogous to direct Data Model instance generation.

• The static context is initialized from the environment (e.g., by the XQuery implementation), as is the dynamic context. Both are affected by other parts of the Processing Model, such as the inclusion of in-scope schema definitions.

• The execution engine (perhaps “evaluation engine” would be more appropriate) acts on the Data Model instances provided to the XQuery expression being evaluated and (normally) generates other Data Model instances. Those instances may be serialized when query evaluation has completed, but the Processing Model does not require that. Data Model instances so generated can be passed directly to other processes, perhaps another XQuery expression evaluation.

• The execution engine depends on the dynamic context and, by implication, on the static context, while the process of parsing an XQuery expression, and converting it into whatever internal execution constructs that the execution engine uses, depends only on the static context.

Actual XQuery implementations will undoubtedly use variations of this Processing Model – for example, some implementations might not support direct generation of Data Model instances – but they will all provide the same essential capabilities indicated by this Model.

11.3.1 The Static Context

Whenever an XQuery expression is processed, the set of initial conditions governing its behavior is called the static context. The static context is a set of components, with values that are set globally by the XQuery implementation before any expression is evaluated. The values of a few of those components can be modified by the query prolog (see Section 11.7), and the values of a very few can be modified by the actions of the query itself. Figure 11-1 illustrates just where in the XQuery processing model the static context is used.

Table 11-1, adapted from the XQuery 1.0 specification, summarizes the components of the XQuery static context and how their values can be changed. For an explanation of the meaning of each component identified in the first column of the table, please refer to the XQuery 1.0 specification.

Table 11-1

XQuery Static Context Components

image

image

In the headers of the rightmost three columns, “Implementation” means “the XQuery implementation,” “Prolog” means “the XQuery prolog,” and “Expression” means “the expression itself.” Within the rows of those columns, an “s” means that the corresponding object can set the value of the component, an “a” means that the object can augment3 the value of the component, and a dash (” – “) means that the object cannot change the value of the component.

When a component’s value can be changed by the implementation or by the prolog, the initial value can be overwritten and/or it can be augmented. Some components can be overwritten, while others cannot. Other components can be augmented, while others cannot. For such fine detail, we recommend that you consult the XQuery 1.0 specification.

11.3.2 The Dynamic Context

The dynamic context represents aspects of the environment that may change during the evaluation of an XQuery or that might be changed by environmental factors other than the XQuery implementation itself. Some people, us included, view the static context as part of the dynamic context; others don’t.

Table 11-2, also adapted from the XQuery 1.0 specification, summarizes the components of the XQuery dynamic context and how their values are set. For an explanation of the meaning of each component identified in the first column of the table, please refer to the XQuery 1.0 specification. In this table, “y” means that the corresponding object can change the value of the component and “ – “ means that it cannot.

Table 11-2

XQuery Dynamic Context Components

image

image

For additional details, we recommend that you consult the XQuery 1.0 specification.

11.4 The XQuery Grammar

Appendix C: XQuery 1.0 Grammar contains the complete XQuery 1.0 grammar in EBNF (Extended Backus-Nauer Form). In the following sections, we refer to a number of nonterminal symbols defined in that grammar without elaborating on them in the text of this chapter. Please reference that appendix to see the definitions of those symbols in context. In addition, the EBNF conventions used to define the XQuery grammar are given in that appendix.

Before we get started with our discussion on XQuery expressions, there’s one subject to address that doesn’t obviously fit anywhere else: XQuery comments. Like all good (and most bad) programming languages, XQuery allows its users to embed comments into XQuery expressions. XQuery’s chosen comment syntax is often called “smiley comments” because of the delimiting characters chosen to start and end those comments. A comment is started with the sequence “(:” and terminated with the sequence “:)”, which bear a striking resemblance to those well-known emoticons used in ordinary text messages.

XQuery comments can be used anywhere that ignorable whitespace is acceptable. An XQuery comment can contain any string of characters, except that it must not contain “:)”, which would cause the text following that sequence to be interpreted as part of the query itself. Comments can be nested to any level, which means that “(:” within a comment will be interpreted as the beginning of a nested comment.

11.5 XQuery Expressions

XQuery is, as you’ve read elsewhere in this book, a functional language. The XQuery 1.0 specification says in Section 2 Basics: “XQuery is a functional language, which means that expressions can be nested with full generality.” It continues: “(However, unlike a pure functional language, it does not allow variable substitutability if the variable declaration contains construction of new nodes.)”

More generally, a functional (programming) language4 is one that encourages a style of programming that emphasizes the value of expressions instead of the algorithms by which those values are computed. (Languages that focus on procedural mechanisms for computing values are sometimes called imperative programming languages; languages that encourage the statement of the problem in a nonprocedural way, allowing the system to determine the best way to solve the problem, are often called declarative languages.) Expressions in a functional language are formed by building them up from smaller expressions (in some languages, those smaller expressions are literally functions – subprograms, if you will). For example, the expression “2*(3+4)” computes the product of 2 and a number that is itself computed as the sum of 3 and 4 – that’s a very functional way of expressing such a computation. In an imperative programming language, you might instruct the computer system to do something like this by a sequence of instructions (shown here in pseudo-code rather than in any particular language):

image

(Of course, virtually all modern programming languages allow expressions such as “2*(3+4)” to be written directly, but our point is made.)

One characteristic of many functional languages is that the “functions” (including all expressions in the language) are free of side effects. A side effect in this context is a computational effect that persists even after the computation has completed. A common example of a side effect in data management systems is updating persistent data on a mass storage device. Arguably, another kind of side effect is printing or displaying results onto some output device.

Most useful programs involve side effects, at least of this last kind. Even languages, such as SQL, that have side effects such as changing the values of persistent data can behave as a functional language when they are evaluating expressions (dividing their operation into the functional aspect of computing a value in a nonprocedural manner, followed by a phase of causing side effects). Many languages that appear to be functional in nature are not always so, if they support the use of functions written in some other programming language and that other language permits side effects to take place.

XQuery is a functional language because its expressions are made of other, “smaller” expressions, down to the irreducible level of literal values, references to variables and parameters, and function invocations. It is, for now, a side effect-free language – as long as no external functions are used that generate side effects. We say “for now” because it is inevitable (as you’ll read in Chapter 13, “What’s Missing?”) that the XQuery language will be extended to support updating of XML, possibly in persistent stores – and that, by almost any definition, is a side effect.

In Appendix C: XQuery 1.0 Grammar, we show you the syntax of XQuery’s modules. In that grammar, the rather important BNF nonterminal symbol QueryBody is not resolved. As you might expect, knowing that XQuery is a functional language, a QueryBody is simply an expression, as shown in Grammar 11-1. (See Appendix C: XQuery 1.0 Grammar for an explanation of how to read these EBNF productions.)

Grammar 11-1   Syntax of a Query Body

image

Grammar 11-4 illustrates the syntax of expressions, but the basic primitive expressions in XQuery are called primary expressions, the syntax of which is found in Grammar 11-2. These expressions are used to build up more complex expressions in an XQuery.

Grammar 11-2   Primary Expression Syntax

image

image

In the following subsections, we discuss each of these primary expressions as well as some of the other expressions that make up the XQuery language. We have not rigidly ordered these subsections according to the sequence of alternatives in various grammar productions; instead we have organized the discussion by starting with simpler kinds of expressions before dealing with more complex ones.

11.5.1 Literal Expressions

The most primitive kind of expression in XQuery is a literal. A literal is a character string that lies in the lexical space5 of one or more data types (the lexical space of a data type is the collection of character strings that can be used to express any possible value of that data type). For example, the character string 1.1E1 is a literal lying in the lexical space of the XML Schema data types xs:float and xs:double, while ‘Mars Attacks!’ is a literal lying in the lexical space of xs:string, and 3.14159 is a literal in the lexical space of xs:decimal, xs:float, and xs:double.

Broadly speaking, a literal is an expression whose value is itself. Therefore, the value of 3.14159 is, well, 3.14159 and the value of 1.1E1 is 11. As you see in Grammar 11-3, numeric literals come in three “flavors”: integer literals, decimal literals, and double literals. String literals can be enclosed in double quotes (” “) or in apostrophes (’’, sometimes called “single quotes”). The characters permitted in a string literal include all Unicode characters other than ampersands (&). To include a double quote in a literal enclosed in double quotes, you simply, well, double it. Similarly, to include an apostrophe in a literal enclosed in apostrophes, you simply use two consecutive apostrophes. You can also use character references of the form &#xnnnn; (where “nnnn” is one to six hex digits specifying the Unicode code point for the desired character). This comes in handy when you need to use characters that might not appear on your keyboard, such as &#x222D; for the character triple integral: ∭. Finally, you can use one of the five predefined entity references defined by XML itself (<, >, &, ", and ’).

Grammar 11-3   Syntax of Literals

image

11.5.2 Constructor Functions

An expression that is almost as primitive as a literal is a constructor function invocation. As you saw earlier, the value 3.14159 is a valid literal in the lexical space of three data types: xs:decimal, xs:float, and xs:double. Because XQuery is a strongly typed language, it’s sometimes necessary to specify more precisely the data type you want a literal to be.

XQuery provides constructor functions for this purpose. Constructor functions are defined in the Functions and Operators specification (about which you learned in Chapter 10), but they are sufficiently central to XQuery itself that we briefly mention them here. For the purposes of the XQuery grammar, constructor functions are invoked exactly like ordinary functions.

To ensure that your literal 3.14159 is a value of xs:double, you can write the constructor function invocation xs:double (3.14159) or the equivalent constructor function invocation xs:double(“3.14159”). If your query requires a decimal value instead, you could simply write xs:decimal (3.14159). However, XQuery is clever enough to infer that the literal 3.14159 by itself is intended to be of type xs:decimal and not of type xs:double or of type xs:float.

By contrast, values of some data types can be constructed only with an explicit constructor function invocation or an explicit cast from string to the desired data type. For example, to express the date commonly known as American Independence Day, it is not sufficient to write 1776–07–04. XQuery would interpret that to mean “the integer value 1765” (the result of subtracting 7 from 1776 and then subtracting 4 from that result). Instead, one must write xs:date (“1776–07–04”) or “1776–07–04” cast as xs:date.

XQuery automatically provides a constructor function for every built-in data type defined in XML Schema Part 2, as well as for every data type derived from them in any schemas that you might import (see Section 11.7) in your query.

11.5.3 Sequence Constructors

Another simple kind of XQuery expression is the sequence constructor. There are many ways in XQuery of generating a sequence – after all, the fundamental building block in the Data Model is the sequence (and every XQuery value is a sequence of zero, one, or more items). As a result, it is technically accurate to say that every expression evaluates to a sequence.

In fact, the XQuery grammar fragment in Grammar 11-4 defines XQuery expressions in general to be a sequence of ExprSingles. Note that this is not a type of primary expression.

Grammar 11-4   Syntax of Sequence Construction

image

image

There are only two types of sequence constructor in XQuery. One of these uses commas (,) as the operator that constructs a sequence from two items, as illustrated in Example 11-1. In general, this first form of sequence constructor can be used in any context where a general Expr is appropriate; however, in a context where a single value (ExprSingle in XQuery terms) is required, a sequence constructed using the comma operator can be used only when enclosed in parentheses. (This is because a sequence of values separated by commas without surrounding parentheses is not recognized in the XQuery grammar as a single value. It requires the parentheses to group the values into a single value that is a sequence.) By convention in this book, we enclose all such sequences in parentheses.

Example 11-1   Construction of a Sequence Using the Comma Operator

image

Don’t be fooled: the result of the sequence constructor in Example 11-1 is entirely different from the character string ‘This reviewer gives 3 stars to this film’. Example 11-1 results in a sequence of five items:6 an xs:string value, an xs:integer value, and three more xs:string values. By contrast, the character string is a sequence of only one item: a single xs:string value.

Sequences constructed with comma operators are not limited to containing items of atomic types. They can contain any sort of item, including elements, XML comments, and so forth (but, as you read earlier in this chapter, not other sequences).

The other kind of sequence constructor is called the range expression, written RangeExpr in Grammar 11-4. (The BNF nonterminal symbol AdditiveExpr is addressed in the next subsection, Arithmetic Expressions. For our purposes here, it’s just an expression that evaluates to a value of type xs:integer.)

A range expression constructs a monotonically increasing sequence of consecutive integers beginning with the value of the first (or only) AdditiveExpr in the RangeExpr and ending with the value of the last (or only) AdditiveExpr in the RangeExpr, as illustrated in Example 11-2, which constructs the sequence (5, 6, 7, 8, 9, 10).

Example 11-2   Construction of a Sequence Using the Range Expression

image

If the second AdditiveExpr is specified and its value is less than the value of the first AdditiveExpr, then the RangeExpr evaluates to an empty sequence, represented in XQuery as an empty pair of parentheses: ( ). (If you need to generate a sequence of consecutive integers in descending order, you can apply the F&O function fn:reverse( ) to a range expression.)

11.5.4 Variable References

Variable references are another kind of primitive expression in XQuery. As you can see in Appendix C: XQuery 1.0 Grammar, XQuery allows the declaration of variables in query prologs. In addition, variables are declared as part of certain other expressions, particularly the for and let clauses of FLWOR expressions. Of course, those variables are of little use unless they can be referenced in XQuery expressions. The name of a variable is a QName and is always preceded by a dollar sign ($) when the variable is being defined and when it is being referenced. Grammar 11-5 provides the syntax of variable references. Recall that a QName is made up of two parts: an optional namespace URI, lexically represented by a namespace prefix, and a local name. Two variable references reference the same variable if their local names are the same and the namespace URIs bound to their namespace prefixes are the same.

Grammar 11-5   Syntax of Variable Reference

image

A variable reference is syntactically invalid if there is not a variable of the same QName in the in-scope variables (see Table 11-1). If there exists a variable named “studio,” then that variable is referenced using “$studio.”

11.5.5 Parenthesized Expressions

A parenthesized expression is exactly what its name implies – an expression surrounded by parentheses, as expressed in Grammar 11-6. Note that the contained expression is, in fact, optional, allowing a bare pair of parentheses – ( ) – to be used as the representation of an empty sequence.

Grammar 11-6   Syntax of Parenthesized Expressions

image

In an XQuery expression, parentheses can be used to force a desired precedence of operators that is different from the default precedence. For example, the expression “2*3+4” has a different result – 10 – than the expression “2*(3+4)” – 14. Parentheses can also be used where they don’t change the semantics of an expression, perhaps to make the precedence in an expression explicit, or for aesthetic purposes.

11.5.6 Context Item Expression

In Chapter 9, “XPath 1.0 and XPath 2.0,” as well as in Section 11.2, you learned that XPath and XQuery nearly always have a context item (as well as a context position and context size) that is used as the context in which many expressions are evaluated.

In XQuery, the context item is referenced using the syntax shown in Grammar 11-7. This is commonly referred to as “dot.”

Grammar 11-7   Syntax of Context Item Expression

image

A context item expression evaluates to the context item (which may be either a node or an atomic value). Evaluation of a context item expression when the context item is undefined results in an error.

11.5.7 Function Calls

A function call, like almost everything else in XQuery, is an expression. In Appendix C: “XQuery 1.0 Grammar,” we see that functions are declared using syntax that includes the name of the function (a QName) and a pair of parentheses that optionally includes a comma-separated list of parameter declarations. Once declared, a function can be invoked as part of an XQuery expression, returning a value of the type specified when the function was declared. The Functions and Operators specification defines a number of functions that are always available to use in XQuery expressions. Other functions can be made available for use in an XQuery expression in three ways: They can be declared in the XQuery prolog, they can be imported from a library module, and they can be provided by the external environment as part of the static context.

A function call (or function invocation), shown in Grammar 11-8, bears some resemblance to the function declaration syntax mentioned in the previous paragraph. The important difference, of course, is that a function call specifies arguments that provide the values for the function’s parameters.

Grammar 11-8   Function Call Syntax

image

When a function call is evaluated, the name of the function has to be equal to the name of a function in the static context, and the number of arguments in the function must be equal to the number of parameters in the function’s declaration.

Function calls are evaluated in several steps.

1. Each argument is evaluated. Multiple arguments can be evaluated in any order – and might not be evaluated at all, if the implementation can determine the result of the function without knowing the value of any particular argument.

2. Each argument value is converted to its expected type using these rules:

a. If the type of the argument matches the type of the corresponding parameter, then no conversion is performed.

b. The argument value is atomized, which results in a sequence of atomic values.

c. Each item in that sequence whose type is xdt : untypedAtomic is converted to the expected type of the corresponding parameter. When the function being invoked is one of the built-in functions defined by the Functions and Operators spec (see Section 10.9, “Functions and Operators”), if the expected type is numeric, then argument values whose types are xdt:untypedAtomic are converted to xs:double. (This last provision applies only to built-in functions because user-defined functions cannot declare parameters with an expected type of numeric.)

d. Each numeric item in the sequence that can be promoted to the expected atomic type using the type promotion rules (detailed in the XQuery language spec) is promoted.

e. Each item whose type is xs:anyURI that can be promoted to the expected atomic type using the type promotion rules is promoted.

f. Each item whose type is neither xdt:untypedAtomic, a numeric type, or xs:anyURI is converted to its expected type as though the XQuery cast operator had been used.

3. If the function being invoked is one of the built-in functions, it is evaluated using the converted argument values, and the result of the evaluation is either a value of the function’s declared type or an error. If the function is a user-defined function, then the function body is evaluated, with each argument value bound to the corresponding parameter, and the value returned by the function body is converted to the function’s declared type using the argument conversion rules described earlier (an error is raised only if the conversion fails).

An example of a function declaration and a corresponding function call is given in Example 11-3.

Example 11-3   Function Call Example

image

11.5.8 Filter Expressions

A filter expression is merely any primary expression followed by zero or more predicates, as specified in Grammar 11-9. The result of a filter expression comprises each item returned by the primary expression for which all of the predicates are true. If there are no predicates, then the value of the filter expression is exactly the same as the value of the primary expression.

Grammar 11-9   Syntax of Filter Expressions

image

The order of items in the result of the filter expression is the same as the order in which those items appeared in the primary expression.

You were exposed to predicates in Chapter 9, “XPath 1.0 and XPath 2.0,” so they are not addressed in detail in this chapter. Recall that a predicate is an expression whose value is a Boolean value, such as a comparison expression. Below, Section 11.5.11 discusses Boolean-valued expressions that are used in predicates.

11.5.9 Node Sequence-Combining Expressions

Now that we’ve covered the simpler kinds of expressions, let’s look at expressions that combine node sequences – union, intersect, and except. The syntax used for these sequence-combining expressions is seen in Grammar 11-10. The operands of these three operators are node sequences, not values (as they are in SQL, for example). Consequently, it is not possible to evaluate an expression such as (1, 2) union (2, 1) – the contents of the two sequences are values and not nodes.

One of these, using the union operator (equivalently, the vertical bar operator, |), returns a sequence containing all nodes that appear in either of its node sequence operands. Another, using the intersect operator, returns a sequence containing only those nodes that appear in both of its node sequence operands. The third, using the except operator, returns a sequence containing all nodes that appear in its first node sequence operand but not in the second.

Grammar 11-10   Syntax of Node Sequence-Combining Expressions

image

All three of these expressions eliminate duplicate nodes (based, of course, on node identity) and, unless the ordering mode is unordered, return the result node sequence in document order. Example 11-4 illustrates some of these expressions. For the purposes of these examples, assume that “A” represents the movie node corresponding to the movie Absolute Power, “B” represents the movie node for the film Below, and “C” represents the movie node of the film Corruption, and also assume that the document containing these three nodes happens to contain them in that sequence: A followed by ? followed by C. The XQuery comments preceding each example indicates the value computed by the expression.

Example 11-4   Node Sequence-Combining Examples

image

image

11.5.10 Arithmetic Expressions

Let’s get back to basics. XQuery supports the basic kinds of arithmetic operations that most programming languages provide: addition, subtraction, multiplication, and division; it also provides a modulus operator. In addition (pun noted, but not intended), XQuery provides unary plus and minus operators.

Because XQuery is not primarily intended as a mathematical computation language, it does not provide built-in operators for operations such as exponentiation, extraction of roots, or logarithmic computations. (However, we do anticipate that each community will develop libraries of user-defined functions to support the operations on which their work depends.)

The syntax of XQuery’s arithmetic operators is shown in Grammar 11-11, and a few examples are seen in Example 11-5.

Grammar 11-11   Grammar of Arithmetic Expressions

image

Because the hyphen (–) is used as the subtraction operator, as the negation operator (“unary minus”), and as a valid character in XML names, XQuery requires that the subtraction operator be preceded by white space if it could possibly be mistaken as part of the preceding token. For example, “MyStars-1” is a valid XML name; if your intent is to subtract 1 from the rating of a film (number of stars) given by a reviewer, then XQuery requires that to be expressed something like this: “MyStars -1” or “MyStars – 1.”

In an AdditiveExpr, the plus sign (+) indicates addition of the values of the two operands, and the hyphen, also called a minus sign (-), specifies the subtraction of the value of the second operand from the value of the first.

In a MultiplicativeExpr, the asterisk (*) means multiplication of the values of the two operands. In many programming languages, a slash (/) is used to indicate division. However, XQuery uses the slash as a path expression operator, so the keyword “div” was chosen to indicate division and a second keyword, “idiv,” indicates integer division – specifically, division of the value of the first operand by the value of the second. The keyword “mod” indicates the modulus operation (which, simplified, means to return the remainder of a division operation instead of returning the quotient).

In Example 11-5, we use XQuery comments to state the result of the example.

Example 11-5   Examples of Arithmetic Expressions

image

image

In XQuery, numbers are handled using rules that are needed when mixing integers, decimal numbers, and floating-point numbers. These rules say that any arithmetic operation that involves numbers of two different data types requires one number to be “upcast,” or “promoted,” to the type of the other. XQuery deals with only four of the many XML Schema numeric types – xs:integer, xs:decimal, xs:float, and xs:double. Even though, in XML Schema, there is no type derivation relationship between xs:decimal, xs:float, and xs:double (but xs:integer is derived from xs:decimal), XQuery treats them as though there were such a relationship. Example 11-6 illustrates the numeric type hierarchy and provides a couple of examples.

Example 11-6   Numeric Type Promotion

Type promotion hierarchy:

image

7

Double required, integer provided:

image

Decimal required, integer provided:

image

Float required, decimal provided:

image

Decimal required, double provided:

image

(“demotion” is not supported)


7Because XML Schema defines xs:integer as a subtype of xs:decimal, every value of type xs:integer is a value of type xs:decimal; therefore, this relationship is not technically a type promotion.

If an operation requires promotion of one value to the type of the other value, then xs:integer values can be promoted to any of the other three types, xs:decimal values can be promoted to xs:float or xs:double, and xs:float values can be promoted to xs:double. If the variable $ i is of type xs:integer and the variable $ j is of type xs:float, then the expression $i – $j requires that the value of $ i be promoted to xs:float before the operation is performed; it also requires the result of the operation to be of type xs:float.

Each operand of an arithmetic operator is evaluated in four steps:

1. The operand is atomized as described earlier in this chapter. (Because the operand is atomized, it is possible to provide a node – instead of an atomic value – as an operand. This allows the use of, say, element nodes directly as operands of an operator.)

2. If the atomized operand is the empty sequence, then the result of the operation is the empty sequence. Note that implementations are not required to evaluate the other operand, but they are permitted to do so if (for example) they want to exhaustively discover errors.

3. If the atomized operand is a sequence of length greater than 1, a type error is raised.

4. If the atomized operand is of type xdt:untypedAtomic, it is converted to xs:double; if that conversion fails (e.g., the value is the string “Midnight Cowboy”), then an error is raised.

If, after these steps have been applied to both operands, one or both operands are not of a type suitable for the operation – such as an effort to subtract a value of type xs:decimal from a value of type xs:IDREF – an error is raised. Even when the operands are both of suitable types, errors can be raised by the operation itself, such as an attempt to divide by zero.

As we have seen, XQuery has a rich set of arithmetic operators, but that set will be complemented by function libraries that provide even more functionality.

11.5.11 Boolean Expressions: Comparisons and Logical Operators

There are two kinds of expressions in XQuery that produce Boolean results. One kind, comparison expressions, provides the ability to compare two values. The other, logical expressions, allow the combination of Boolean values, such as those produced by comparisons.

Comparison expressions can be divided into three categories: value comparison, general comparison, and node comparison. Value comparisons are used to compare two single values, general comparisons are (for all practical purposes) quantified comparisons – also called existential comparisons – that can be used to compare sequences of any length, and node comparisons are used to compare two nodes.

The grammar of comparison expressions is presented in Grammar 11-12 (slightly modified for clarity from the grammar as published in the XQuery specification). Note that the three types of comparison use different sets of operators. It’s tempting to conclude that the ordinary comparison operators (=, >, etc.) could have been used for all three types; however, if XQuery had done so, it would be impossible in many instances to determine whether any given comparison was intended to be a value comparison, a general comparison, or a node comparison.

Grammar 11-12   Comparison Expression Grammar

image

Value comparisons require that the value of each operand be determined. The steps are very similar to those involved in determining the values of the operands of arithmetic expressions, except for the fourth step:

1. The operand is atomized as described earlier in this section.

2. If the atomized operand is the empty sequence, then the result of the operation is the empty sequence. Note that implementations are not required to evaluate the other operand, but they are permitted to do so if (for example) they want to exhaustively discover errors.

3. If the atomized operand is a sequence of length greater than 1, a type error is raised.

4. If the atomized operand is of type xdt:untypedAtomic, it is converted to xs:string. While operand type conversion for arithmetic operators naturally falls back to a numeric type (xs:double), comparisons are more often based on comparing string values than strictly numeric values.

If the values of the two operands have types that are compatible for the purposes of comparison, then they are compared. If the value of the first operand is equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to the value of the second operand, then the comparison using the “eq,” “ne,” “lt,” “le,” “gt,” or “ge” operator, respectively, is true; otherwise, the comparison is false. Some value comparisons are illustrated in Example 11-7.

Example 11-7   Value Comparison Examples

image

General comparisons, as we said earlier, act as existential comparisons. By “existential comparison” we mean this: If there exists at least one value in the sequence that is the value of the first operand that has the proper comparison relationship (using the value comparison rules!) with at least one value in the sequence that is the value of the second operand, then the general comparison is true; otherwise, it is false.

In principle, every value belonging to each of the two sequences is compared to every value in the other sequence. In practice, implementations are very often able to determine the result without actually doing so many comparisons, so the rules of XQuery allow implementations to return the result without compulsively comparing every combination of values. One consequence of this permissive rule is that there may be errors that would result from comparing some particular value in the first sequence with some other specific value in the second sequence, but the implementation might return a true/false result and not raise the error. The second example in Example 11-8 illustrates such a situation.

Of course there are a few rules to cover the relationships between the types of the operands:

• If one operand is of type xdt:untypedAtomic and the other is of any numeric type, then they are both converted to xs:double.

• If one operand is of type xdt:untypedAtomic and the other is of either type xdt:untypedAtomic or type xdt:string, then they are converted to xs:string as required.

• If one operand is of type xdt:untypedAtomic and the other is of neither xdt:untypedAtomic, xdt:string, nor any of the numeric types, then the xdt:untypedAtomic operand is converted to the runtime type of the other operand.

Example 11-8 provides a few sample general comparison expressions.

Example 11-8   General Comparison Examples

image

The final example is a valid general comparison, even though the operands are single values – remember that a single value is a singleton sequence containing that value.

General comparisons do not behave like comparisons that use the same operators in most other languages (it’s value comparisons that behave like comparisons in those other languages, albeit with different operators). The differences are caused by the existential semantics of general comparison. Therefore, even though “(1, 2) = (2, 3)” is true and “(2, 3) = (3, 4)” is true, “(1, 2) = (3, 4)” is false! That is, general comparisons are not transitive. Similarly, both “(1, 2) = (2, 3)” and “(l, 2) != (2, 3)” are true – inverted operators do not imply inverted results.

Node comparisons are different from both value comparisons and general comparisons, in that they do not compare values at all, but compare nodes based on their identities. In a node comparison, if either operator evaluates to a sequence of more than one node, then an error is raised. If either operand evaluates to an empty sequence, then the result of the comparison is also the empty sequence (node comparisons can have three values: true, false, and the empty sequence).

If the two operands have the same identity – that is, they are the same node – then the “is” comparison is true; otherwise, that comparison is false. If the first operand is a node that appears earlier in document order than the node identified by the second operand, then the ”<<” comparison is true and the “>>“ is false. There is an exception to this rule: If the ordering mode is unordered (see Section 11.4.7, “Function Calls”), then the results are nondeterministic, because document order is not maintained. Example 11-9 demonstrates these principles.

Example 11-9   Node Comparison Examples

image

image

As you have seen in this discussion of comparison expressions, the result of such an expression is either a Boolean value (true or false) or the empty sequence. A frequent requirement in applications is to combine the results of multiple comparisons: “I want to find the MPAA rating of all movies whose titles contain the word ‘Outlaw’ that were released after 1965.” In this example, two comparisons are combined: “titles contain the word ‘Outlaw’” and “released after 1965” – both must be true for the movies I want to find. The ability to combine multiple comparison expressions is provided in XQuery with logical expressions.

A logical expression is an expression that permits the combination of Boolean values to achieve an aggregated Boolean result. Logical expressions in XQuery operate on the effective Boolean value of their operands and, naturally, follow the rules of Boolean algebra. XQuery defines only two Boolean operators: and and or. (Many languages include a third Boolean operator: not. In XQuery, the functionality of that operator is provided by a function, fn:not.)

Grammar 11-13   Grammar of Logical Expressions

image

The behaviors of these two operators are defined in Table 11-3 and Table 11-4. The cells that contain “true or error” or “false or error” imply that an implementation may choose to determine the value of the expression from the value of the one operand that does not generate an error, or it may choose to raise an error instead. It’s worth pointing out that XQuery, like SQL and many other programming languages, but unlike a few popular languages such as C, does not require evaluation of the operands of logical expressions in any particular order. Instead, implementations are free to reorder the evaluation of the operands for such reasons as query optimization.

Table 11-3

Semantics of or

image

Table 11-4

Semantics of and

image

Example 11-10   Examples of Logical Expressions

image

Comparison expressions and logical expressions can be combined in powerful ways, making up arbitrarily complex predicates that are used in the FLWOR expression’s where clause and in the predicates of path expressions. But users new to XQuery must be careful to use the correct operator when comparing two items. Remember that most languages use the symbol “=” to mean “this value is equal to that value,” while XQuery uses it in an existential sense to mean “any item in this sequence is equal to any item in that sequence.” In order to get the semantics that “=” provides in other languages (including gaining protection from situations where it is possible for one or both operands to be a sequence of length greater than 1), XQuery expressions use the “eq” operator instead.

11.5.12 Constructors – Direct and Computed

One of the strengths of XQuery (over, say, XPath) is its ability to construct XML nodes and thus to build up brand new XML fragments or complete documents in the result of a query. In addition to the constructor functions and sequence constructors that we discussed earlier in this section, XQuery provides two different classes of constructors for nodes. Document nodes can be constructed using only one class of constructors, while five of the other six node types can be constructed by both classes. XQuery does not represent namespace bindings as nodes, so there is no way in XQuery to construct namespace nodes (the seventh node type).

The two classes of node constructor are direct constructors and computed constructors. Direct constructors use an XML-like syntax, while computed constructors use a syntax based on enclosed expressions. (An enclosed expression is an expression enclosed within curly braces: {…}.)

Direct Constructors

Direct constructors are, in most ways, nothing more than well-formed XML that appears in an XQuery. We say “in most ways” because – as you’ll read later – it is possible to supply the content of elements and the values of attributes (but not element or attribute names) using enclosed expressions. A very simple example of a direct element constructor is:

image

The syntax for direct constructors appears in Grammar 11-14. Throughout this grammar, we have omitted specific indication of where white space is required or permitted – such indications merely clutter up the grammar and can be obtained from the published XQuery specification.

Grammar 11-14   Grammar of Direct Constructors

image

image

image

There’s a lot of detail in that grammar that we don’t need to examine, but we encourage our readers to ensure that they understand most of it.

Document nodes cannot be created using direct constructors, so Grammar 11-14 does not define any syntax related to document node construction. Neither does it include syntax for direct construction of text nodes – that’s done simply by the inclusion of text as the content of a directly-constructed element.

Let’s examine the various direct constructors one at a time. The constructors included in this discussion are: direct element constructors (and the direct attribute constructors they might contain), direct comment constructors, and direct processing instruction constructors. (Remember that there is no way to construct document nodes using direct constructors, and no way at all to construct namespace nodes in XQuery.) The direct comment constructors and direct processing instruction constructors are simple, so let’s get them out of the way before we explore direct element constructors.

An XML comment looks like this:

image

The text of the comment (comment-text) is restricted because of the XML rule that prohibits two consecutive hyphens (—) in a comment. The syntax of the DirCommentConstructor, found near the end of Grammar 11-14, is an exact copy of the corresponding grammar production for comments in the XML Recommendation.8 Therefore, in XQuery, a direct comment constructor is nothing more than an XML comment.

An XML processing instruction (frequently abbreviated “PI,” which is not to be confused with pi, π) is superficially similar in appearance to an XML comment. A PI looks like this:

image

The content of the PI (PI-content) – which is optional – is also restricted because of an XML rule, this one prohibiting a question mark followed by a right angle bracket (?>) in the content. In addition, following rules imposed by the XML Recommendation, a PI’s target (PI-target) must not be spelled with a leading “X” or “x” followed by an “M” or “m” followed by an “L” or “1.” The syntax of the DirPIConstructor, found at the end of Grammar 11-14, is an exact copy of the corresponding grammar production for comments in the XML Recommendation. Consequently, in XQuery, a direct PI constructor is exactly the same as an XML PI.

Direct element constructors are a bit more involved, but they still closely follow the syntax of elements in XML. That is, like ordinary elements in XML, empty elements can be written thusly:

image

while nonempty elements are written like this:

image

The element-content is optional, which makes it possible to write an empty element using the start tag/end tag notation used by nonempty elements.

Both empty elements written using the short notation and nonempty elements can have an attribute list immediately following the tag-name in the start tag. In fact, the XML Recommendation considers the attribute list to be part of the start tag itself; XQuery calls it out separately for expositional purposes.

An attribute list is, naturally, a list of attributes. In this case, it is a list of direct attribute constructors. A direct attribute constructor is, as you see in Grammar 11-14, an attribute name followed by an equal sign, and a quoted attribute value. It’s important to note that the quoted attribute value is permitted to contain enclosed expressions by which part or all of the attribute value is computed at query evaluation time! This computation of the attribute’s value does not change the constructed nature of the attribute constructor.

The content of a nonempty element can include several different objects. It can contain other direct constructors, including direct element constructors, direct comment constructors, and direct PI constructors. It can contain CDATA constructors that are identical to the CDATA sections defined in the XML Recommendation (recall that the Data Model does not support CDATA sections directly, but represents them as ordinary text nodes), as well as arbitrary character sequences (excluding left and right braces, ampersands, and left angle brackets: { }&<) and enclosed expressions. The inclusion of an enclosed expression makes it possible for part or all of an element’s content to be computed at query evaluation time, which does not change the constructed nature of the element constructor. In Example 11-11, we’ve illustrated some direct element constructors – be sure to note the strong similarity to ordinary elements in XML documents. It’s also important to say that the XQuery comments have absolutely no effect on the constructed XML – they simply disappear from the results of the construction.

Example 11-11   Direct Constructor Examples

image

image

Computed Constructors

Computed constructors make it possible for XQuery expressions to generate XML even when certain key information – such as the name of an element or of an attribute – is unknown when the XQuery expression was coded. Computed constructors have a completely different look than direct constructors. There is no effort to make the syntax look XML-like, because the focus is on ease of specifying the information that must be computed in order to create the node, especially the names of element and attribute nodes.

The grammar of computed constructors is presented in Grammar 11-15.

Grammar 11-15   Grammar of Computed Constructors

image

image

The most obvious difference between the syntax in Grammar 11-14 and that in Grammar 11-15 is the absence of all those angle brackets used by direct constructors. Instead of using the syntax of XML to create the various node types, computed constructors require an explicit keyword to specify the kind of node being created. That keyword is followed in some cases either by the name of the node or by an expression whose value is to be used as the name of the node. The node-type keyword is also followed by an expression supplying additional information needed to construct a node. Let’s look at each in turn.

A computed comment constructor is very straightforward:

image

The result of that constructor is this XML comment:

image

Note that the enclosed expression is a character string literal in this case. It would have been incorrect to have used the enclosed expression {Computed constructors are vital in XQuery} (that is, omitting the quotes) because the content of the enclosing braces would not correspond to any valid XQuery expression. Another computed comment constructor is:

image

in which the computed comment’s expression is an invocation of the built-in function fn:coneat( ) to concatenate the value of the variable $typevar with a character string literal. If the value of $typeVar happened to be “Direct,” then the comment constructed by this computed comment constructor would be:

image

Computed text constructors are just as straightforward as computed comment constructors. The differences are the name of the constructor and the precise result. The computed text constructor:

image

results in a text node whose value is:

image

Note that the keyword text is followed by an enclosed expression, which means that the material between the braces must be an expression, which in this example is a string literal. As with computed comment constructors, the enclosed expressions of computed text constructors can contain subexpressions whose values must be computed at query evaluation time. It is possible to construct a text node whose value is a zero-length string; such nodes, when used as the content of a constructed element or document node, will simply disappear. Incidentally, two adjacent text nodes in the content of a constructed element are merged into a single text node.

Computed PI constructors are only slightly more complex, the added complexity arising entirely from the fact that processing instructions have targets. If the value of the variable $tgtVar is “xml-stylesheet”, then the following two computed PI constructors are equivalent in their effects:

image

Both of those computed constructors produce the following XML PI:

image

Perhaps obviously, the first of those computed PI constructors could have just as easily been written as a direct PI constructor. The choice of which to use in situations like this is largely a matter of personal style.

Looking at Grammar 11-15, you’ll notice that computed attribute constructors are not contained within computed element constructors, but are true peers to computed element constructors. Contrast this with Grammar 11-14, in which attributes could be constructed only as part of a directly constructed element. An implication of this fact is that you are able to create stand-alone attributes – part of the XQuery Data Model, but not allowed in XML or in the Infoset.

To construct an attribute, you’d write something like:

image

or:

image

A computed attribute constructor that creates an attribute of an element being created with a computed element constructor is expressed as part of the computed element’s content. (Again, contrast this with the treatment of attributes in Grammar 11-14.) A computed element constructor looks like this:

image

image

When the value of the variable $characterAge is 24, that computed element constructor produces the following element:

image

The final kind of computed constructor is the computed document constructor. Its syntax is exactly the same – except for the name of the constructor – as the computed text and comment constructors, but its content would naturally be a bit more complex. And, of course, the result is a complete document. A useful exercise for the reader is to write a computed document constructor for any of the XML documents found in this book.

11.5.13 Ordered and Unordered Expressions

XML, used as a markup language, creates documents that are inherently ordered. Think about a book that is marked up in XML – the second chapter must always follow the first chapter, and the paragraphs in each chapter must always be in the sequence in which the author wrote them. As a result, XQuery treats the XML that it queries as ordered (and, in particular, it handles that XML in document order) unless instructed to do otherwise. One way in which an XQuery can be instructed to “do otherwise” is through the order by clause (see Section 11.6.3), through which the author of an XQuery forces the results of an expression to be ordered according to specified criteria.

However, because XQuery is sometimes applied to information that doesn’t represent books or other traditional “documents” – such as relational data, as you’ll see in Chapter 15, “SQL/XML” – the notion of “document order” is not always a meaningful one. Instead, any ordering to be applied to query evaluation is imposed as part of the query (such as the order by clause just mentioned) or is an artifact of the optimizations that the query processing engine applies to the evaluation of that particular query, often based on factors such as indexes or other physical storage facets.

In order to provide applications with the ability to write queries that selectively bypass considerations of inherent ordering, XQuery provides, as primary expressions, both ordered expressions and unordered expressions. The syntax of these two expressions appears in Grammar 11-16.

Grammar 11-16   Syntax of ordered and unordered Expressions

image

In Section 11.2.3, our description of XQuery’s static context included a component named “ordering mode.” When either an ordered expression or an unordered expression appears as a part of an XQuery expression, the ordering mode in the static context is set to ordered or unordered, respectively, for the lexical scope of the Expr that appears between the curly braces ({}). Of course, that Expr can be any XQuery expression and thus can have arbitrarily deep nesting of other expressions, including other ordered and unordered expressions.

The ordering mode affects the behavior of most step expressions (as discussed in Chapter 9, “XPath 1.0 and XPath 2.0”), the set operators (union, intersect, and except), and FLWOR expressions that don’t have an order by clause. If the ordering mode of those expressions is “ordered,” then the node sequences that they return are in document order; if the ordering mode is “unordered,” then the node sequences are in an implementation-dependent order. (Note, however, that the ordering mode has no effect on elimination of duplicate nodes from those node sequences.) Because the order of nodes in those node sequences is implementation-dependent, the behavior of certain functions, such as fn:position( ), as well as numeric predicates in path expressions, is nondeterministic.

In addition to ordered and unordered expressions, XQuery provides the fn:unordered( ) function that takes any sequence (not necessarily of nodes) and returns it in a nondeterministic order. This function is not a “randomizing” function – that is, it might well return the sequence in its original order. It merely gives permission to the XQuery evaluation engine to reorder the sequence if necessary for reasons such as performance optimization.

11.5.14 Conditional Expression

Generally speaking, a conditional expression is one that returns one of two values based on the evaluation of a predicate. (This is not the “if statement” used by imperative languages that causes execution to take one of two branches.) In most languages offering conditional expressions, as in XQuery, those expressions are defined using the keyword if. In fact, XQuery’s grammar uses the BNF nonterminal symbol “IfExpr” to define such expressions, as seen in Grammar 11-17.

Grammar 11-17   Conditional Expression Grammar

image

When an IfExpr is evaluated, the IfTestExpr is first evaluated to find its effective Boolean value, as described in Section 11.2. If the effective Boolean value is true, then the result of the IfExpr is the result of evaluating the IfTrueExpr. Otherwise, the result of the IfExpr is the result of evaluating the IfFalseExpr. Example 11-12 illustrates how we might decide which of two movies to watch tonight based on which of two other movies was released first.

Example 11-12   Conditional Expression Example

image

11.5.15 Quantified Expressions

In XQuery, quantified expressions provide the ability to do existential quantification (“Does at least one of these values meet this criterion?”) and universal quantification (“Do all of these values meet this criterion?”). Let’s examine the syntax of quantified expressions in Grammar 11-18.

Grammar 11-18   Quantified Expression Grammar

image

image

The Quantifier keyword some causes existential quantification to be evaluated, while the keyword every causes universal quantification. Each QuantifiedInClause declares a variable, whose type may optionally be specified, and binds it to the sequence of items resulting from the evaluation of the QuantifiedBindingSequence expression.

A variable declared in one QuantifiedInClause can be used in the QuantifiedTestExpression, and even in the QuantifiedBindingSequence of the QuantifiedInClauses that follow its own QuantifiedBindingSequence. (Wow! What a mouthful.)

The result of a QuantifiedExpr that specifies some is true if at least one evaluation of the QuantifiedTestExpression results in a value of true, while the result of a QuantifiedExpr that specifies every is true only if every evaluation of the QuantifiedTestExpression results in true. When some is specified and the result of the QuantifiedBindingSequence is the empty sequence, the result is false. Why? Because there are no values for which the QuantifiedTestExpression can evaluate to true. By contrast, when every is specified and the result of the QuantifiedBindingSequence is the empty sequence, the result is true, because there are no values for which the QuantifiedTestExpression can evaluate to false.

Example 11-13 illustrates the use of quantified expressions.

Example 11-13   Quantified Expression Examples

image

image

11.5.16 Expressions on XQuery Types

When you read Appendix C: XQuery 1.0 Grammar, you will see that a sequence type is the (data) type of something that can appear in a Data Model sequence – which is pretty much anything recognized by the Data Model. Sequence types can be specified (using the sequence type syntax) in variable declarations, as well as in function parameter declarations and results.

There are several other places in XQuery where sequence types are specified. In a couple of these, the sequence type itself is tested. The syntax of the five additional expressions in which sequence types are used – instance of, typeswitch, cast, castable, and treat – is shown in Grammar 11-19.

Grammar 11-19   Grammar of Expressions on Sequence Types

image

Let’s examine them one at a time.

An InstanceOfExpr is used to determine whether a given expression has a particular sequence type or not. Example 11-14 provides examples of using this expression and one example illustrating how it might be put to use in the context of a larger expression.

Example 11-14   Examples Using instance of

image

A query uses a TypeSwitchExpr to choose one of several expressions based on the dynamic (run-time) type of a test expression. In Example 11-15, you see an example of a type switch expression and an example of using it in context.

Example 11-15   Examples Using typeswitch

image

image

It is often necessary to convert values of one data type to another data type, depending on the specific needs of a query. For example, your query might retrieve a string from some element, knowing that the string is a sequence of digits, convert the string to an integer, and then use that integer value in a computation. The cast expression provides that capability for XQuery, as illustrated in Example 11-16.

Example 11-16   Examples Using cast

image

There are several reasons why a cast might fail. The value being cast is first atomized. If atomization results in a sequence longer than 1, a run-time error is raised. If atomization results in an empty sequence and the sequence type was specified without the “?” (indicating that an empty sequence is permitted), a run-time error is raised. If the static type of the value being cast is not one that can be converted to the target type as indicated in the Functions and Operators specification, a run-time error is raised. Finally, if the actual value being cast cannot be converted to the target type, a run-time error is raised.

Which brings us to the next expression, the castable expression. Sometimes, in the context of a query, a cast is required under conditions where the query cannot guarantee that the values being cast are always appropriate for the target type. If such a cast is attempted, a run-time error is raised. But run-time errors are generally Not A Good Thing, especially when queries may be very complex and long-running – nobody wants her query to simply report “Error” after running for 15 minutes. (Unfortunately, XQuery 1.0 doesn’t have a way for a query to detect and handle errors – such as the try/catch blocks used in some languages.)

The castable expression allows you to write your queries in a self-protective manner, so casts that would fail at run time can be avoided.

Example 11-17   Examples Using castable

image

Frequently, your query knows that a value being used is always of a specific known type or of a type derived from that known type. For example, your query might have to deal with data off the web that has not been carefully constructed with attention paid to certain details. One element, let’s call it RegionCode, in the data might be instances of either xs:integer or my:DVDRegionCode, which is derived from xs:integer. But the query author wants to ensure that only values representing region codes (that is, whose type is my:DVDRegionCode but not xs:integer) are actually processed and is willing to endure a run-time error if any other sort of data is encountered. The first treat expression, illustrated in Example 11-18, is used to provide this capability. A more relaxed query author might decide that values of either xs:integer or my : DVDRegionCode are acceptable, but not values of xs:double or xs:float. The second treat expression in the example illustrates this usage.

Example 11-18   Examples Using treat

image

image

The purpose of the treat expression is to allow a query author to provide a guarantee that instance data being queried have appropriate data types. It has particular value when the static typing features is implemented and in use, because it provides information that the static type evaluation algorithms can use to determine and enforce the type correctness of expressions that use treat.

11.5.17 Validation Expression

Every XQuery expression that does not raise an error evaluates to some result, which is a sequence of items. As discussed both earlier in this chapter and in Chapter 10, that sequence might contain no items (the empty sequence), one item (singleton), or more than one item. The items in the sequence might be atomic values or complex values (such as XML documents or elements). When the result of an XQuery expression – whether it’s the “top-level” expression (that is, the QueryBody of an XQuery Module) or some expression nested deep within a QueryBody – is an XML document or an element, it’s very useful to know whether the result is valid or not (and even just how valid it is!) according to the associated XML Schemas.

In order to validate the result of an XQuery expression, the XML Schema or Schemas against which that result is to be validated either must be implicitly included in the environment in which the XQuery expression is evaluated, or it must have been imported via the use of the import schema clause in the XQuery prolog.

The result of successfully validating some node is a copy of that node (with a different identity!) in which it and all of its descendent nodes have been annotated with a validity assessment and a Data Model type. If validation fails, an error is raised.

XQuery supports validation of the results of expressions through the validate expression, whose syntax is given in Grammar 11-20.

Grammar 11-20   Syntax of validate expression

image

The syntax of the validate expression is deceptively simple. Why? Because it depends on the rules of XML Schema Part 19 to provide the detailed semantics of validation. The actual process of validation is discussed in Chapter 10 of this book.

The validated node either corresponds directly to the node being validated or, for a validated document node, to the only element child of the document node.

Example 11-19 provides a few examples of successful validation and validation efforts that will raise errors.

Example 11-19   Validation Expression Examples

image

11.6 FLWOR Expressions

The FLWOR expression is arguably the very heart of XQuery. If you’ve programmed in SQL, it might be helpful if we told you that the FLWOR expression serves approximately the same purpose in XQuery that the SELECT expression serves in the SQL language. Section 11.10 contains some discussion of the relationship between the two languages and the two expressions.

In this rather lengthy section, we first describe the process of producing a tuple stream from the for and let clauses. Then we look at cutting down (or filtering) that tuple stream using a where clause, followed by seeing how to order the results using the order by clause. Finally, we cover the return clause, which defines what actually gets returned by the FLWOR expression – that is, what the expression evaluates to.

In the introductory parts of Appendix C: XQuery 1.0 Grammar, we see the syntax of XQuery’s FLWOR expressions. As you can infer from the syntax, the term FLWOR is derived from the first letter of the names of each immediate subexpression: for, let, where, order by, and return.

The XQuery 1.0 specification says that the FLWOR expression “supports iteration and binding of variables to intermediate results” and that “[this] kind of expression is often useful for computing joins between two or more documents and for restructuring data.”

With that in mind, let’s look at the purposes and behaviors of each of the subexpressions of FLWOR.

11.6.1 The for Clause and the let Clause

The XQuery spec introduces the for and let expressions with the unfortunate sentence “The purpose of the for and let clauses in a FLWOR expression is to produce a tuple stream in which each tuple consists of one or more bound variables.” Those of you familiar with relational theory will certainly recognize the word tuple, as will many others.

In this context, a tuple is a binding of a variable to a sequence of (zero or more) values or, by extension, pairs (or triples, etc.) of such bindings, depending on the number of variables used in the combination of for clauses and let clauses. A tuple stream is a sequence of such tuples that can be considered in turn.

Consider the for clause illustrated in Example 11-21, which calls on the movies document in Example 11-20 (seen in earlier chapters as well). Of course, as we can tell from the syntax of FLWOR expressions, a for clause cannot appear alone – at a minimum, it must be followed by a return clause.

Example 11-20   Reduced movie Example and Trivial studio Example

image

Example 11-21   Trivial for Clause and Corresponding Tuple Stream

image

Note that the result contains three instances of the variable $m, each instance being bound to a separate movie element from the original document. That result is a tuple stream comprising three tuples, each of which is an instance of the variable and the value (called its binding sequence) to which that instance is bound.

The for clause iterates over the items in the binding sequence, binding the variable to each of those items in turn. When the ordering mode applied to the FLWOR expression is ordered, the tuple stream is also ordered (in the same order as the binding sequence); when the ordering mode is unordered, the tuple stream’s order is implementation-dependent.

A for clause can have multiple variable bindings, as shown in Example 11-22, which depends on the two documents seen in Example 11-20. In this case, the two variables, $m and $ s, are each associated with a binding clause, but they are not independent of one another. Instead, each binding of $m is associated with every binding of $s. The XQuery 1.0 spec says, “The resulting tuple stream contains one tuple for each combination of values in the respective binding sequences.” Those of you familiar with the relational model – or with vector operations from your math classes – will recognize this as a cross product.

This concept can be extended arbitrarily to cover as many variables as the for clause supplies.

Example 11-22   for Clause with Multiple Variables and Corresponding Tuple Stream

image

image

image

If the ordering mode in effect for a FLWOR expression whose for clause defines multiple variables is ordered, then the first variable provides the primary sort order, the second provides the secondary sort order, and so forth.

The let clause also binds variables with the values returned by expressions, but without iteration. Instead, the let clause binds its variables with the entire value of their respective expressions – the entire sequence, not one item of the sequence at a time. A let clause that binds two or more variables generates a single tuple containing all of the variable bindings, as illustrated in Example 11-23.

Example 11-23   The let clause and Resulting Tuple

image

image

The scope of the variables bound in a for clause or a let clause is every subexpression in the same FLWOR expression that follows the individual for clause or a let clause in which the variable is bound (but not, of course, the expression to which the variable is bound). Consequently, an expression such as that in Example 11-24 is possible (if not necessarily useful in this case).

Example 11-24   Binding a Variable and Then Using It

image

image

One often misunderstood implication of the rule that the “scope of the variables bound … is every subexpression … that follows“ is that a variable declared in one clause (a for clause, for example) can be apparently redeclared in a subsequent clause (a let clause, perhaps) in the same FLWOR expression, as illustrated in Example 11-25. However, that apparent redeclaration does no such thing. Instead, the second declaration is actually a declaration of a new variable of the same name, whose declaration obscures the previously declared variable of that name. As a result, expressions in clauses following the let clause (in this example) can never access the value of the variable $i declared in the for clause; all such efforts will see only the other variable $i declared in the let clause.

Example 11-25   Redeclaring Variables

image

In both for clauses and let clauses, each variable being bound may be specified to have an explicit type. If the value bound to a variable with an explicitly declared type does not match that type, using the rules of sequence type matching, then a type error is raised.

In a for clause, a bound variable can be accompanied by a positional variable, whose value is an integer that represents the position of each value in the bound variable’s binding sequence in turn. Repeating Example 11-21 with a positional variable, we get the same results, as shown in Example 11-26.

Example 11-26   Trivial for Clause with Positional Variable

image

image

11.6.2 The where Clause

As the grammar in Appendix C: XQuery 1.0 Grammar shows, FLWOR expressions can optionally include a where clause, the purpose of which is to filter the tuples generated by the preceding for and/or let clauses. The ExprSingle contained in the where clause, called the where expression, is evaluated once for each of those tuples, and only those tuples for which the effective Boolean value of the where expression is true are retained.

In Example 11-27, we have coded a FLWOR expression fragment in which a where clause is applied to a positional variable generated in a for clause that is otherwise identical to Example 11-26.

Example 11-27   Trivial for Clause and where Clause Using Positional Variable

image

image

Note that the result in Example 11-27 is identical to the result in Example 11-26, except that one tuple – the tuple in which the value of $i and the value of the myStars attribute are both 3 – is absent.

And, yes, the where clause really is that simple. We leave as an exercise for the reader to determine the result of the FLWOR fragment in Example 11-28.

Example 11-28   Another where Clause

image

11.6.3 The order by Clause

The order by clause is used to reorder the tuples in the tuple stream generated by the for and/or let clauses, possibly filtered by a where clause. If a FLWOR expression does not contain an order by clause, then the order of tuples is determined by the for and/or let clauses and by the ordering mode (ordered or unordered, as discussed earlier in Section 11.4.13). If an order by clause is present, then it determines the order of those tuples based on values present in the tuples themselves. (Note that the ordering done by the order by clause is done by values and not by nodes or by node identity.)

An order by clause has one or more ordering specifications (OrderSpec), each of which contains an ExprSingle and an optional ordering modifier (OrderModifier). The ExprSingle is evaluated using the variable bindings in each tuple. The relative ordering of two tuples is determined by evaluating each OrderSpec, in left-to-right sequence, until an OrderSpec is encountered for which the two tuples do not compare equal. When evaluating an OrderSpec:

• The result of the ExprSingle is atomized; if the result of atomization is neither a single atomic value nor an empty sequence, then an error is raised.

• Values of type xdt:untypedAtomic are cast to xs:string.

• The values of the ExprSingle in every row in the tuple stream must be able to be cast into a single data type that has the gt (value greater than) operator defined; if there is no such type, then an error is raised.

The optional OrderModifier can specify that the ordering is to be ascending or descending (that is, whether the tuples are delivered with the lowest values appearing first or last). It can also specify whether empty sequences and, for values of type xs:float and xs:double, the special value NaN (Not a Number), are sorted as greater than all other values (empty greatest) or less than all other values (empty least). The OrderModifier can also specify a collation that governs how xs:string (and, because of the cast cited earlier, xsd:untypedAtomic) values are compared.

If the order by clause specifies stable, then for any two tuples that compare equal for every OrderSpec, the relative order of those tuples is the same as in the original tuple stream. If stable is not specified, then the relative order of two such tuples is implementation-dependent.

Example 11-29 illustrates a FLWOR expression fragment that uses an order by clause.

Example 11-29   Trivial for Clause, where Clause, and order by Clause

image

image

11.6.4 The return Clause

Every FLWOR expression contains a return clause (which is why previous examples in this section are characterized as FLWOR fragments). The ExprSingle contained in a return clause is evaluated once for each tuple that is produced by the for clauses and/or let clauses and/or where clause and/or order by clause. The results of these evaluations are concatenated (as if they were assembled using the comma operator) into a sequence; the resulting sequence is the value of the FLWOR clause.

Example 11-29 can be completed by adding a return clause, as shown in Example 11-30.

Example 11-30   A Complete FLWOR Expression

image

The result in Example 11-30 is a sequence of two values, each the value of the attribute myStars of the element movie contained in a tuple in the tuple stream produced by the for clause, filtered by the where clause, and sorted by the order by clause.

In Appendix A: The Example, you will see other examples of FLWOR expressions.

11.7 Error Handling

XQuery provides three categories of errors that can be raised: static errors that can be raised only during the static analysis phase (such as a syntax error), dynamic errors that can be raised during either the static analysis phrase or the dynamic analysis phrase (such as division by zero), and type errors that can also be raised during either the static analysis phrase (such as the static type of an expression being compatible with the type required by the context) or the dynamic analysis phrase (such as the dynamic type of a value being incompatible with the static type of the expression producing that value).

In the XQuery 1.0 specification and all of its accompanying specifications, errors are indicated by the convention err:XXYYnnnn, where “err” is used in these documents as a namespace prefix for the namespace “http://www.w3.org/date/xqt-errors” (the final value for “date” will be determined by the publication date of the final Recommendation for XQuery); “XX” is a two-letter code identifying the particular document in which the error is defined (e.g., “XQ” for the XQuery specification or “FO” for the Functions and Operators spec); “YY” is another two-letter code indicating the category of error (e.g., “ST” for an XQuery static error or ”ar“ for a Functions and Operators arithmetic error); and “nnnn” is a unique numeric code for the specific error. For example, “err:XQST0032” identifies the static error that results from a query prolog containing more than one base URI declaration. (By the way, “err” is not a predefined prefix and must be declared explicitly if you wish to use it.)

If the XQuery implementation reports errors to the external environment from which XQuery modules are invoked, it does so in the form of a URI reference that is derived from the QName of the error. The error mentioned in the previous paragraph, “err:XQST0032”, would be reported as the URI reference “http://www.w3.org/date/xqt-errors#XQST0032”. Implementations may also return a descriptive string along with the URI reference of an error, as well as any values that the external environment might use to attempt to recover from the error or to diagnose a problem.

The Functions and Operators specification provides a special function, fn:error ( ), that returns no value at all (in fact, its return type is explicitly “none”). Its sole purpose is to permit a query expression to raise an error under user-defined circumstances. If your XQuery needs to raise the error mentioned twice in this section, you could do so by invoking fn:error (“err:XQST0032”). As you will find in the F&O specification, this function allows argument values of other types than the QNames of error conditions, including strings that might contain a human-readable message.

11.8 Modules and Query Prologs

In Appendix C: XQuery 1.0 Grammar, we see the syntax for XQuery modules, including module prologs. In this section, we take a closer look at the reasoning behind modules, the components of modules and prologs, and how they are used.

What is a module? Why does the concept exist in XQuery? According to the XQuery 1.0 spec, a module is “a fragment of XQuery code that conforms to the Module grammar and can independently undergo the static analysis phase.” The first part of that definition is almost a tautology, but the second part gives a pretty good clue: An XQuery module is a bit of XQuery code that can be compiled separately. (The XQuery 1.0 spec doesn’t mention compilation, but that intent is easy to discern.)

As anybody who has developed complex software systems knows, breaking applications into modules that can be written, compiled, and even debugged separately, and then allowing those modules to interact with one another, has many advantages, not the least of which is the potential for code reuse. That lesson was not lost on XQuery’s definers.

In XQuery, there are two kinds of modules: main modules and library modules. A main module is one that contains both a prolog and a query body (an expression that can be evaluated), while a library module is one that includes only a module namespace declaration and a prolog. It’s easy to figure out the purpose of a main module: It’s the “thing” that can be executed, or evaluated. By contrast, a library module is one that cannot be evaluated directly, but that provides declarations for functions and variables that can be imported into other modules (ultimately into a main module).

Every user of XQuery uses main modules, even if it’s not obvious that they are doing so. The reason is obvious from the grammar: The version declaration and everything in the query prolog are optional! Consequently, this is a perfectly valid XQuery main module: 42. By contrast, because library modules require explicit syntax, they are likely to be used in more complex applications.

Before delving into the details of prologs, main modules, library modules, and module namespace declarations, let’s consider the first bit of syntax in a Module, the optional VersionDecl.

Knowing that the future is difficult to predict, the definers of XML realized that requirements not yet known might lead to new versions of the language; as a result, authors of XML documents are free – even encouraged – to indicate the version of XML used by those documents; they are also able to indicate the character coding (e.g., UTF-8 or UTF-16) used to encode those documents. Similarly, there may well be future versions of XQuery, so it is desirable to allow authors of XQueries the freedom to indicate the version of XQuery being used, as well as the freedom to specify an encoding declaration – the name of the character coding in which they are encoded. Currently, the only version number allowed in an XQuery is “1.0.” The encodings permitted are defined by each XQuery implementation, but we expect that all implementations will support at least one of UTF-8 or UTF-16 (which is precisely what XML requires). Example 11-31 provides a few examples of valid VersionDecls.

Example 11-31   Examples of VersionDecl

image

11.8.1 Prologs

A MainModule is a Prolog followed by a QueryBody. We have seen examples of QueryBodys, and we have referred to the contents of the Prolog. Now it’s time to see exactly what the Prolog is.

The Prolog (frequently called the query prolog to distinguish it from the XML document prolog) provides syntax that allows authors of XQuery modules to declare several things that affect the behaviors of XQuery expressions. Some of the items that can be specified in a query prolog – such as boundary space policy, ordering mode, and default collation – override the implementation defaults for those items. Others – such as variable and namespace declarations – may augment implementation defaults, but do not override them. Information about each of these items is available in Table 11-1, and you can find the details of which can be overridden and that can be augmented in the XQuery 1.0 specification in its section entitled “The Static Context.”

• declare boundary-space – Overrides the implementation-defined boundary space policy that determines whether boundary whitespace is preserved by element constructors during evaluation of the query; preserve means that boundary whitespace is preserved, and strip means that it is deleted.

• declare default collation – Overrides the implementation-defined default collation used for character string comparisons in the module that do not specify an explicit collation. The specified collation must be among the statically known collations or an error is raised.

• declare base-uri – Overrides the implementation-defined default base URI that is used to resolve relative URIs within the module.

• declare construction – Overrides the implementation-defined default that determines whether attribute and element nodes being copied into a constructed element or document node retain (preserve) or lose (strip) existing type information.

• declare ordering – Overrides the implementation-defined default ordering mode (ordered or unordered) applied to all expressions in the module that do not have an explicit ordering mode.

• declare default order – Overrides the implementation-defined default that determines whether empty sequences sort less than (empty least) or greater than (empty greatest) other values.

• declare copy-namespaces – Controls the namespace bindings that are assigned when existing element nodes are copied by element constructors (preserve or no-preserve, as well as inherit or no-inherit).

It is a syntax error if any of these declarations are specified more than once in a query prolog. Some other declarations are permitted to appear more than once:

• import schema – Imports the element and attribute declarations and the type definitions from a schema into the in-scope namespaces, possibly binding a namespace prefix to the target namespace of the schema. Multiple schemas can be imported, but the definitions they contain must not conflict or an error is raised. Location hints may be provided, but their meaning is completely determined by the XQuery implementation.

• import module – Imports the function and variable declarations from one or more library modules into the function signatures and in-scope variables of the importing module. Modules are identified by their target namespaces, and all modules with a given target namespace are imported when that target namespace is specified. Importing a module that in turn imports another module does not make the function and variable declarations of that last module available to the original importing module. Location hints may be provided, but their meaning is completely determined by the XQuery implementation.

• declare namespace – Augments the implementation-defined predefined (statically known) namespaces and prefixes, making an additional namespace available to the query.

• declare default element namespace and declare default function namespace – Specifies the namespace URI that is associated with unprefixed element (and type) names and function names, respectively, within a module.

• declare variable – Declares one or more variables, optionally with a type. Variables can be declared to be external or can be given an initial value. External variables can be given a value only by the external environment from which the module is invoked.

• declare function – Declares one or more functions (along with their parameters) that can be invoked from expressions contained in the module. Functions can be declared to be external or can be declared with an (XQuery) expression that comprises the function body. External functions are implemented outside of the query environment. An external function is one written in a language other than XQuery. We expect that many XQuery implementations will support external functions written in Java, C#, and other common programming languages.

• declare option – Declares an implementation-defined option, the meaning of which is completely defined by the XQuery implementation.

11.8.2 Main Modules

A main module is one that contains, in addition to a (possibly empty) query prolog, a query body – an expression that is evaluated when the module is invoked. A query has exactly one main module. Evaluating the expression that is the query body of a main module is the same as executing, or running, the query.

How a main module is invoked is very much left to the XQuery implementation. Some implementations may provide a command-line interface or a graphical user interface (GUI) that allows a query to be typed directly by a user. Other implementations may provide ways to embed XQueries into some other programming language, such as Java, C, or Python. Still others might allow applications to invoke methods in some application programming interface (API) and pass the text of XQueries and main modules for evaluation (see Chapter 14, “XQuery APIs,” for more information). Still others might provide for a GUI facility that builds queries without having to enter the character strings conforming to XQuery syntax. We expect that all implementations will provide at least one of these methods, and that some will provide more than one.

However, the sequence of events once a main module is invoked is well defined. In fact, it is generally described in Section 11.2.2, “The XQuery Processing Model.” That description does not cover every detail, so here is a more precise list of the steps involved in the invocation of a main module. Of course, before these steps can be performed, the invoking environment has to provide input data in the form of a Query Data Model instance, possibly by parsing a serialized XML document into an Infoset, perhaps performing Schema validation on that Infoset to produce a PSVI, and then transforming the result into a Data Model instance. (Note that several of these steps depend on the implementation’s providing optional features: schema import, modules, and static typing are all optional.)

• The in-scope schema definitions in the static context are initialized, possibly by extracting them from actual XML schemas, as well as through implementation-defined means.

• The MainModule undergoes static analysis. This involves several steps of its own.

– The module is transformed into an operation tree that represents the query (the transformation to an operation tree is a definitional technique – implementations are free to handle this in any way they wish).

– The static context is initialized by the implementation and then modified according to information in the Prolog, which is done in a couple of steps:

– The in-scope schema definitions are augmented by the schema imports in the Prolog.

– The static context is augmented with function and variable declarations from modules that are imported.

– The augmented static context is used to resolve names (schema type names, function names, namespace prefixes, and variable names) appearing in the module.

– The operation tree is normalized by transforming various implicit operations (such as atomization, type promotion, and determination of Effective Boolean Values) into explicit operations.

– Every expression in the query is assigned a static type.

• The MainModule undergoes dynamic analysis. Several actions are involved in dynamic analysis.

– The operation tree is traversed, evaluating subexpressions at the leaves of the tree, and then combining their results when evaluating the subexpressions at the appropriate branches of the tree.

– The dynamic context is augmented or changed by creation of new Data Model instances, by binding values to variables, etc.

– The dynamic type of each expression is determined as the expression is evaluated. If the dynamic type of an expression is incompatible with the static type of the expression, an error is raised.

• The result of dynamic analysis is often (but not necessarily) serialized into a character string; if the result of the dynamic analysis is an XML document, then the result of the XQuery is an XML document in character string form.

Complete examples of main modules can be found in Appendix A: The Example.

11.8.3 Library Modules

Library modules support the notion of modularizing applications, which is done for reasons of design, maintenance, and code reuse. A library module comprises only a ModuleDecl and a Prolog. The ModuleDecl defines the target namespace of the library module, while the Prolog contains declarations for functions and variables that are exported (made available) for importation (inclusion) by other library modules and by main modules. Example 11-32 contains a scenario of importing library modules.

Example 11-32   Importing Library Modules

image

image

Some of the components of Example 11-32 deserve a few additional words of discussion.

• import module namespace myLibs … at …: The way in which a module is imported into another module (main or library) is to import the module’s namespace. All of the functions defined in a library module are named with QNames whose namespace is typically the module’s namespace. The at clause allows the query author to provide the XQuery implementation with a hint about where the code for a library module might be found.

• module namespace myLibs: Every module other than a main module is declared by specifying its module namespace.

• import module namespace myLibs: A library module is allowed to import its own namespace. Doing so does not cause an infinite loop of a module including itself forever. Instead, it allows modularization of a single namespace into multiple “physical” modules that can be “merged” into one module for query evaluation purposes.

• import module namespace rev: No surprise – one module is allowed to import different module namespaces to use the functions declared in those other modules.

11.9 A Longer Example with Data

You will find more examples of XQuery expressions, along with the source data on which they operate to give the specified results, in Appendix A: The Example.

11.10 XQuery for SQL Programmers

Before we complete our discussion of XQuery 1.0, we think it’s worth responding to requests from any number of SQL programmers who, while learning XQuery 1.0, have asked us to explain XQuery concepts in terms of more familiar SQL concepts. Not at all incidentally, the similarities of some of the two languages’ concepts is due in part to the fact that they share a lot of the same concepts – and at least one of the same creators (our friend and colleague, Don Chamberlin of IBM)!

Arguably, the most important syntax element of XQuery is the FLWOR expression, while the best analogy in SQL is the query expression, better known as the SELECT expression. (Many programmers refer to this as the SELECT statement, but that statement is used only in interactive SQL and is not used in SQL programs.)

Figure 11-2 graphically illustrates the relationships between the clauses, or subexpressions, of FLWOR and the analogous syntax elements of SQL’s SELECT.

image

Figure 11-2 Relationship between FLWOR and SELECT.

Note that XQuery’s let clause has no analog in SQL’s SELECT expression,10 while SQL’s GROUP BY and HAVING clauses have no analog in XQuery (strictly speaking, SQL’s HAVING clause is merely another WHERE clause that uses a different keyword and that is applied to the result of the GROUP BY clause). Finally, note that SQL’s ORDER BY clause is not actually part of the SELECT expression, but is used only in cursor declarations and a very limited number of additional places.

Of course, in SQL, the FROM clause identifies tables from which rows are chosen, joining them with rows from other tables if specified, while the for expression in XQuery identifies XML nodes. There are other important differences as well, but it’s not our purpose in this book to detail the similarities and differences between these two popular query languages. We won’t belabor the analogy further, except to mention that XQuery’s for clause supports joins in which nodes from one document are combined with nodes from another document, just as the joins specified in SQL’s FROM clause combines rows from one table with rows from another table.

There are other concepts that the two languages share but for which there are important differences. For example, SQL’s collection of data types is not the same as XQuery’s. Many of the data types in the two languages are similar in purpose, but the details vary, often considerably.

In Table 11-5, we have provided a correspondence between SQL’s set of data types and the XQuery Data Model’s set of data types. Note that most of the XQuery Data Model’s types are shown with the namespace prefix “xs:”, indicating that those types are defined in XML Schema. Other types are shown with the namespace prefix “xdt:” to indicate that they are defined by the Data Model itself. Some of SQL’s data types have no analogy in XQuery, and some types used in XQuery have no analogy in SQL; we use a dash (“–”) to indicate that situation. Chapter 15, “SQL/XML,” has more discussion of type correspondences between the two languages.

Table 11-5

SQL Data Types vs. XQuery 1.0 types

SQL data types XQuery 1.0 types
CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT, NATIONAL CHARACTER, NATIONAL CHARACTER VARYING, NATIONAL CHARACTER LARGE OBJECT xs:string
xs:normalizedString
xs:token
xs:language
xs:NMTOKEN
xs:NMTOKENS
xs:Name
xs:NCName
xs:ID
xs:IDREF
xs:IDREFS
xs:ENTITY
xs:ENTITIES
BOOLEAN xs:Boolean
NUMERIC, DECIMAL xs:decimal
INTEGER xs:integer
xs:nonPositiveInteger
xs:negativelnteger
BIGINT xs:long
INTEGER xs:int
SMALLINT xs:short
xs:byte
xs:nonNegativeInteger
xs:unsignedLong
xs:unsignedlnt
xs:unsignedShort
xs:unsignedByte
xs:positiveInteger
FLOAT, REAL xs:float
FLOAT, DOUBLE xs:double
xs:duration
TIMESTAMP WITH TIME ZONE, TIMESTAMP WITHOUT TIME ZONE xs:dateTime
DATE WITH TIME ZONE, DATE WITHOUT TIME ZONE xs:date
TIME WITH TIME ZONE, TIME WITHOUT TIME ZONE xs:time
xs:gYearMonth
xs:gYear
xs:gMonthDay
xs:gDay
xs:gMonth
BINARY LARGE OBJECT xs:hexBinary
BINARY LARGE OBJECT xs:base64Binary
xs:any URI
xs:QName
xs:NOTATION
INTERVAL (day-time interval) xdt:dayTimeDuration
INTERVAL (year-month interval) xdt:yearMonthDuration
XML xs:anyType
xs:anySimpleType
xdt:untyped
(Structured types?) Node types
(Structured types?) User-defined complex types
ROW
REF
ARRAY (List types, sequences)
MULTISET (List types, sequences)
DATALINK

Many of SQL’s expressions have analogs in XQuery (the reverse is true as well). Both languages have arithmetic expressions, string expressions, comparison expressions (and predicates), datetime expressions, and so forth. The details naturally vary, because the languages’ needs are different, as are their data types’ details.

With this modest discussion, we believe that most SQL programmers will be able to use this chapter to begin learning XQuery and applying it in their own applications.

11.11 Chapter Summary

In this rather lengthy chapter, we’ve taken a fairly close look at XQuery proper, after introducing several basic concepts. We discussed every important type of expression in some detail, most of them accompanied by illustrative examples. We spent considerable space on the FLWOR expression, examining each of its clauses in turn, because of its key role in XQuery. We also gave you an overview of XQuery modules, their contents, and how they are used.

After studying this chapter, you are qualified to take that shiny new XQuery engine (already available from major vendors, minor organizations, and open source efforts) for a serious test drive. No single chapter (or book, for that matter) can possibly cover every possible twist and turn of a language as complete – and as complex – as XQuery, but we think this chapter has provided a good introduction.


1XML Query (XQuery) Requirements, W3C Working Draft (Cambridge, MA: World Wide Web Consortium, 2003). Available at: http://www.w3.org/TR/xquery-requirements/.

2XML Information Set (Second Edition), W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/xml-infoset/.

3In this context, “augment” means to add to the value; for example, an implementation is permitted to add more function signatures to the collection already defined in the fn namespace and by the constructors for all built-in types.

4A good discussion of functional programming can be found in the Wikipedia, at http://en.wikipedia.org/wiki/Functional_programming.

5The term lexical space is defined in XML Schema Part 2: Datatypes, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2001). Available at: http://www.w3.org/TR/xmlschema-2/, to be “the set of valid literals for a datatype.”

6An item is either a node or an atomic value.

8Extensible Markup Language (XML) 1.0, Third Edition, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/REC-xml.

9XML Schema Part 1: Structures, Second Edition (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/xmlschema-2/.

10However, it’s not irrational to suggest that the let clause is similar to SQL’s SELECT expression FROM some_single_row_table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.147.193