Chapter 6. Iteration

Introduction

Most programming languages adopt one of two styles of expressions: iteration or application. Iterative expressions explicitly loop through sequences one member at a time. Applicative expressions apply operations to entire sequences at once (implicitly iterating through the sequence and applying the operation to each member).

As we saw in Chapter 3, navigation paths have an applicative style—each step implicitly iterates over the nodes selected by the previous step. However, the rest of XQuery has adopted an iterative style of programming. This chapter explores the XQuery expressions that perform iteration.

The FLWOR (pronounced “flower”) expression is the most powerful of these, capable of introducing variables, filtering sequences, sorting and grouping, iterating over sequences to produce some result, and joining different data sources, all at once.

XQuery also provides iteration expressions for existential (some) and universal (every) quantification, which test whether a condition holds for any or every member of a sequence, respectively.

FLWOR

As its name suggests, FLWOR consists of five major clauses: for, let, where, order by, and return. Every FLWOR begins with one or more for and/or let clauses (in any order), followed by an optional where clause, an optional order by clause, and finally one return clause. The five clauses are demonstrated by the example in Listing 6.1.

Example 6.1. FLWOR consists of five clauses

for $i at $pos in doc("puzzles.xml")//puzzle
let $j := doc("games.xml")//game
where $i/@name = $j/@name
order by @price descending
return $i

Before I explain how FLWOR works, let's first compare FLWOR to equivalent expressions in the popular SQL and XSLT languages. The XQuery FLWOR expression is very similar to SQL SELECT statements and XSLT <xsl:for-each/> loops. If you are already familiar SQL or XSLT, then you may find it helpful to think of FLWOR in terms of the equivalent expressions from these languages. If you're not already familiar with SQL or XSLT, you may want to skip ahead to Section 6.2.3.

Compared to SQL

The similarities between SQL and XQuery are striking. Both XQuery and SQL have a where clause, which filters the overall expression according to a boolean condition. Both XQuery and SQL have an order by clause, which sorts the selection according to some criteria. Both languages also can declare variables and assign them to values. Listing 6.2 shows a SQL query similar to FLWOR.

Example 6.2. SQL queries also contain five clauses

DECLARE @Variable1, @Variable2, ...
SELECT Column1, Column2, ...
FROM Table1, Table2, ...
WHERE ...
ORDER BY ...

Where the languages differ—aside from the fact that SQL is generally case-insensitive while XQuery is case-sensitive—are the ways in which they iterate over data and construct results.

In SQL, all sequences are flat. Consequently, a simple FROM clause naming tables (or views, or subqueries) suffices to describe the data sources, and a simple SELECT clause suffices to list the values to be selected from these sources. SQL is nonpositional, so you can be vague about the order of iteration over these data sources.

In XQuery, the sequences may be flat or they may be hierarchical, and they are almost always ordered. Consequently, the for clause may simply name collections like in SQL), or it may describe paths or other more complex expressions that navigate these collections, and the order in which the iteration is performed is significant. Similarly, the return clause doesn't simply return values from these data sources, but may construct entirely new shapes (which are also ordered).

The similarities between SQL and XQuery aren't accidental. SQL has been a highly successful language for data processing, and its features have evolved to fulfill common needs. XQuery has these same needs and more, and therefore provides functionality that includes and surpasses that of SQL.

Compared to XSLT

The correspondence between XSLT and XQuery may be less obvious, but they are actually more closely related than SQL and XQuery, because both XSLT and XQuery both work with hierarchical, ordered data sources. In fact, many people have questioned the need for XQuery at all, when XSLT 1.0 can express so many of the same concepts.

The selection criteria of the <xsl:for-each> loop is similar to the for clause of FLWOR. The main difference is that a single FLWOR expression may have any number of for clauses, but you have to nest <xsl:for-each> loops to achieve the same effect. Variable assignment in XSLT is performed using <xsl:variable> and is similar to the XQuery let clause.

The where clause of XQuery can be achieved in XSLT using either <xsl:if> or else predicates on the select expression. The order by clause of XQuery loosely corresponds to the <xsl:sort> expression of XSLT, and the return clause in a FLWOR expression corresponds to the body of the <xsl:for-each> loop. Listing 6.3 shows an XSLT expression similar to the XQuery FLWOR expression.

Example 6.3. XSLT for-each loops are similar to FLWOR

<xsl:for-each select="$sequence">
    <xsl:variable name="i" value="./@attr"/>
    <xsl:sort />
    <return/>result here</return>
</xsl:for-each>

The most obvious difference between XSLT and XQuery is that XSLT is expressed using XML while XQuery is not. Other less obvious but even more significant differences are that XQuery allows sequences of atomic values, has a richer type system, and allows composition (iterating over the results of other expressions). It is also somewhat easier to join two data sources in XQuery than in XSLT, or to express certain kinds of grouping operations.

As with SQL, the similarities between XSLT and XQuery aren't accidental. Both languages must navigate and construct XML, and must handle all of its nuances. In many ways, XQuery is a refinement of the ideas that preceded it.

Introducing Variables

The FLWOR expression is one of several XQuery expressions that introduces variable assignments. FLWOR actually introduces three kinds of variable bindings.

One kind of variable assignment in FLWOR occurs with the let clause. In fact, the simplest possible FLWOR expression consists of a single let clause followed by a return clause (see Listing 6.4).

Example 6.4. The simplest FLWOR expression consists of let and return

let $variable := "value"
return $variable

The let keyword is followed by a variable ($name), optionally the variable's type, the assignment operator (:=), and any XQuery expression at all (even another FLWOR).

The effect is to introduce the variable named by the let clause into scope and assign it to the given expression (the binding). As shown in Listing 6.5, each variable is in scope for all the rest of the FLWOR expression—that is, the where, order by, and return clauses, and any other variables introduced after it. (However, a variable cannot refer to itself in its own binding.)

Example 6.5. Variables are in scope for all the rest of the FLWOR

let $x := 1
let $y := $x
where $x > 0 or $y < 0
order by $x, $y
return ($x, $y)

Variables cannot be reassigned once declared, and cannot be redeclared in the same scope.

The other two kinds of variable assignment in FLWOR are iteration and position; both are introduced using the for clause.

The for clause takes a variable, an optional type declaration, optionally the at keyword followed by a position variable, the in keyword (instead of assignment), and finally the XQuery expression to iterate through. Listing 6.6 demonstrates all of these parts.

The effect is that the variable, and the optional position variable if used, are both in scope for the remainder of the FLWOR expression. These variables change in value during the execution of the FLWOR statement; the variable is assigned to each item in the sequence in turn, and the position variable is assigned to the current position within this sequence.

Example 6.6. The for clause iterates through a sequence (optionally with position)

for $item at $pos in $sequence
return  <x index="{$pos}">{ $item }</x>

Each FLWOR expression may contain any number of variable bindings, in any order, as long as there is at least one let or for clause. XQuery also allows you to introduce several variables in a single clause, by separating each variable introduction with commas. Listing 6.7 demonstrates both variable styles.

Example 6.7. FLWOR expressions may introduce many variables

for $i in doc("one.xml")//foo, $j in $i//foo
let $k := $i//bar, $m := $j//bar
for $n in doc("two.xml")//foo
where count($k) > count($n//bar)
  and count($n//bar) > count($m)
return $n

The order in which these variables and clauses appear in the FLWOR expression is significant, as explained next.

Tuples

The tuple space is essential to understanding how FLWOR works. A tuple space is like a matrix: Each row is a tuple, and each row in the tuple space has the same number of columns as all the others. Tuples are similar to ordinary sequences, except that they may themselves contain sequences (i.e., tuples are nested).

Tuples and tuple spaces aren't formally part of the XQuery Data Model, but the FLWOR operator is defined (and best explained) in terms of them. During evaluation, each FLWOR expression corresponds to one tuple space.

Every variable introduced by for and let clauses generates another column in the tuple space. When visualizing tuple spaces, I also like to add columns for each of the where and return clauses, as well as each expression in the order by clause. For example, the FLWOR expression in Listing 6.8 corresponds to a tuple space with six columns: one each for the $i, $pos, and $j variables introduced by for and let, one for the where clause, one for the order by expression, and one for the return clause.

Example 6.8. A sample FLWOR

for $i at $pos in reverse (1 to 10)
let $j := $i * $i
where $j <= 50
order by $i
return ($pos, $i)

Determining the number of rows in the tuple space is somewhat more complicated. If there are only let clauses (no for clause), then there is one row in the tuple space. Otherwise, count the lengths of the sequences over which each for variable iterates, and multiply them together to get the number of rows.

In the example given in Listing 6.8, there are ten rows in the corresponding tuple space, because there is one for variable and it iterates over a sequence of length 10. The tuple space for this FLWOR expression is shown in Table 6.1. Notice that the last column contains a nested sequence.

Table 6.1. The tuple space corresponding to Listing 6.8

$i

$pos

$j

where

order by

return

10

1

100

false

10

(1, 10)

9

2

81

false

9

(2, 9)

8

3

64

false

8

(3, 8)

7

4

49

true

7

(4, 7)

6

5

36

true

6

(5, 6)

5

6

25

true

5

(6, 5)

4

7

16

true

4

(7, 4)

3

8

9

true

3

(8, 3)

2

9

4

true

2

(9, 2)

1

10

1

true

1

(10, 1)

Only rows for which the where clause is true are kept in the final result. So in this example, the first three rows will be omitted.

Next, the rows are sorted according to the order by clause. By default, the order by clause sorts in ascending order (you'll see other kinds of sorting in Section 6.6). In this example, sorting by the order by column reverses the order of the rows.

Finally, after filtering by the where clause and sorting by the order by clause, the return clauses are concatenated together in order (from top to bottom) into one flat sequence. Listing 6.9 shows the result of executing this FLWOR expression.

Example 6.9. The result of executing the FLWOR expression in Listing 6.8

(10, 1, 9, 2, 8, 3, 7, 4, 6, 5, 5, 6, 4, 7)

Visualizing tuple spaces becomes slightly more difficult when there are multiple for variables. Each subsequent for variable splits the existing rows. Consider, for example, Listing 6.10. This query constructs a tuple space with four columns and eight rows (the length of the first sequence times the length of the second sequence).

Example 6.10. A FLWOR expression that joins two sequences

for $x at $p in ("a", "b")
for $y in (1, 2, 3, 4)
where $p * $y > 5
return ($x, $y)

Looking just at the first variable, we would get two rows. Now split each of those rows into four rows (one row for each item in the second sequence), to get the total of eight rows shown in Table 6.2. Only the last two rows satisfy the where clause, so the final result is the sequence ("b", 3, "b", 4).

Table 6.2. The tuple space corresponding to Listing 6.10

$x

$p

$y

where

return

"a"

1

1

false

("a", 1)

"a"

1

2

false

("a", 2)

"a"

1

3

false

("a", 3)

"a"

1

4

false

("a", 4)

"b"

2

1

false

("b", 1)

"b"

2

2

false

("b", 2)

"b"

2

3

true

("b", 3)

"b"

2

4

true

("b", 4)

When the return clause or one of the variable bindings is itself a FLWOR expression, then the tuple spaces are nested. For example, the query in Listing 6.11 produces the nested tuple space shown in Table 6.3, because the first return clause itself returns a FLWOR.

Table 6.3. The tuple space corresponding to Listing 6.11

$i

return

7

The tuple space corresponding to Listing 6.11

9

The tuple space corresponding to Listing 6.11

As in the non-nested case, the return clauses are concatenated together in order (after filtering and sorting). The result of executing this query is the sequence (8, 9, 10, 10, 11, 12).

Example 6.11. A nested FLWOR expression

for $i in (7, 9)
return
  for $j in 1 to 3
  return $i + $j

Quantification

Before learning how to accomplish practical tasks such as joins and grouping using FLWOR, let's first discuss two related operators, some and every. These operators are like “mini-FLWORs” containing only for and where clauses, except that instead of for, these use the keywords some and every, and instead of returning a sequence of values, they return a single boolean value.

Both operators are followed by a variable name, the keyword in, the sequence to iterate over, the keyword satisfies, and finally the condition to be tested for each member of the sequence. The condition is implicitly converted to boolean using Effective Boolean Value. Both operators are illustrated in Listing 6.12.

Example 6.12. The some and every operators

some $emp in doc("team.xml")//Employee satisfies $emp/@years > 5
every $emp in doc("temp.xml")//Employee satisfies $emp/@years > 5

Like the for clause in FLWOR, variables introduced this way are in scope for the rest of the expression (in this case, the condition), and additional variables may be introduced by separating them with commas as in Listing 6.13.

Example 6.13. Quantification may introduce several variables at once

some $i in (1, 2), $j in (1, 3) satisfies $i = $j

The some operator implements what is known as existential quantification. This is just a fancy term meaning that it tests whether there exists a combination of variable bindings such that the condition is true. Similarly, the every operator implements universal quantification. It tests whether every combination of variable bindings satisfies the condition.

For example, the query in Listing 6.13 returns true, because there exists an assignment to $i (1) and an assignment to $j (1), such that the two are equal. However, the same query using every instead of some would return false, because then there exists an assignment to $i (1) and an assignment to $j (3) such that the condition doesn't hold.

These operators are entirely redundant with FLWOR, but they may allow you to express some queries using a more natural syntax (especially every, which is somewhat awkward when written as FLWOR). These equivalences are depicted in Listing 6.14.

Example 6.14. The some and every operators are equivalent to FLWOR

some variables satisfies condition
<=>
exists(for variables where condition return 1)

every variables satisfies condition
<=>
empty(for variables where not(condition) return 1)

Joins

The primary use of the FLWOR expression is to join one or more sequences together (or with themselves) to produce some result. The result may be drawn from the members of the sequences, like database joins, or constructed new in the query.

Joins express relationships between sequences. When two unrelated, independent sequences are combined, then we get a cross-product, the simplest kind of join. The XQuery in Listing 6.15 creates a cross-product of two sequences.

Example 6.15. The cross-product of two sequences

for $i in (1, 2, 3)
for $j in (3, 4, 5)
return ($i, $j)
=>
(1, 3, 1, 4, 1, 5, 2, 3, 2, 4, 2, 5, 3, 3, 3, 4, 3, 5)

More commonly, however, the join eliminates some items in the cross-product from consideration. For example, you might select only those members from both sets that are equal, like in Listing 6.16. This is called an inner join, because it includes only values that are in both sequences. (This particular example is also an equi-join, because it uses the equality operator. Most implementations specially optimize equi-joins.)

Example 6.16. An inner join of two sequences

for $i in (1, 2, 3)
for $j in (3, 4, 5)
where $i = $j
return ($i, $j)
=>
(3, 3)

A more realistic example would join two XML documents. So let's recall the team.xml document from Chapter 1 (Listing 1.26), and consider a second XML source, projects.xml, listed in Listing 6.17.

Example 6.17. The projects.xml document

<?xml version='1.0'?>
<Projects>
  <Project id="X1" owner="E2">
    <Name>Enter the Tuple Space</Name>
    <Category>Video Games</Category>
  </Project>
  <Project id="X2" owner="E1">
    <Name>Cryptic Code</Name>
    <Category>Puzzles</Category>
  </Project>
  <Project id="X3" owner="E5">
    <Name>XQuery Bandit</Name>
    <Category>Video Games</Category>
  </Project>
  <Project id="X4" owner="E3">
    <Name>Micropoly</Name>
    <Category>Board Games</Category>
  </Project>
</Projects>

A simple inner join between these documents is demonstrated in Listing 6.18. This query iterates through the Project elements from projects.xml and the Employee elements from team.xml and finds all pairs such that the project's owner attribute is equal to the employee's id attribute. This kind of join result is called a one-to-one join, because each item in the first sequence has (at most) one corresponding item in the second sequence.

To enhance readability, I've reformatted the result in this example and all the others in this chapter with extra whitespace (by using new lines and sometimes indentation).

Example 6.18. A one-to-one join between the projects.xml and team.xml documents

for $proj in doc("projects.xml")/Projects/Project
for $emp in doc("team.xml")//Employee
where $proj/@owner = $emp/@id
return $proj/Name, $emp/Name
=>
<Name>Enter the Tuple Space</Name>
<Name>Carl Yates</Name>
<Name>Cryptic Code</Name>
<Name>Kandy Konrad</Name>
<Name>XQuery Bandit</Name>
<Name>Jason Abedora</Name>
<Name>Micropoly</Name>
<Name>Jim Barry</Name>

There are several interesting directions to go from here, such as changing the join condition or the result format. Let's consider changing the result first, and then explore other kinds of join conditions.

In this example, the output is somewhat confusing because the natural pairing of employees and projects has been lost when the tuple space result was flattened (remember, sequences in XQuery aren't nested). To achieve a hierarchical result, we need to use XML. In Listing 6.19, we wrap each pair with a new Assignment element. The curly braces inside the element cause its contents to be evaluated as XQuery expressions. (See Chapter 7 for details about XML construction.)

Again, whitespace has been inserted to make the results easier to read.

Example 6.19. The same join as Listing 6.18, but using XML to group results

for $proj in doc("projects.xml")/Projects/Project
for $emp in doc("team.xml")//Employee
where $proj/@owner = $emp/@id
return <Assignment>{ $proj/Name, $emp/Name }</Assignment>
=>
<Assignment>
  <Name>Enter the Tuple Space</Name>
  <Name>Carl Yates</Name>
</Assignment>
<Assignment>
  <Name>Cryptic Code</Name>
  <Name>Kandy Konrad</Name>
</Assignment>
<Assignment>
  <Name>XQuery Bandit</Name>
  <Name>Jason Abedora</Name>
</Assignment>
<Assignment>
  <Name>Micropoly</Name>
  <Name>Jim Barry</Name>
</Assignment>

Another confusing aspect about the result is that there are two Name elements with different meanings. Conflicting node names (and just a general desire to change names) are common when joining XML data sources together. Listing 6.20 uses the XML constructor syntax to create attributes instead of copying the elements from the original document. (See Chapter 7 for more examples of XML constructors.)

Example 6.20. The same join as Listings 6.18 and 6.19, but reshaping the results

for $proj in doc("projects.xml")/Projects/Project
for $emp in doc("team.xml")//Employee
where $proj/@owner = $emp/@id
return
<Assignment proj="{$proj/Name}" emp="{$emp/Name}" />
=>
<Assignment proj="Enter the Tuple Space" emp="Carl Yates" />
<Assignment proj="Cryptic Code"          emp="Kandy Konrad" />
<Assignment proj="XQuery Bandit"         emp="Jason Abedora" />
<Assignment proj="Micropoly"             emp="Jim Barry" />

Other kinds of joins are possible. For example, Listing 6.21 shows a many-to-many join, so named because many items from the first sequence may correspond to many items from the second sequence.

Example 6.21. A many-to-many join between team.xml and projects.xml

for $proj in doc("projects.xml")/Projects/Project
for $emp in doc("team.xml")//Employee
where $proj/Category = $emp/Expertise
return
<Assignment proj="{$proj/Name}" emp="{$emp/Name}" />
=>
<Assignment proj="Enter the Tuple Space" emp="Carl Yates" />
<Assignment proj="Enter the Tuple Space" emp="Jim Barry" />
<Assignment proj="Cryptic Code"          emp="Chaz Hoover" />
<Assignment proj="Cryptic Code"          emp="Jason Abedora" />
<Assignment proj="Cryptic Code"          emp="Wanda Wilson" />
<Assignment proj="XQuery Bandit"         emp="Carl Yates" />
<Assignment proj="XQuery Bandit"         emp="Jim Barry" />
<Assignment proj="Micropoly"             emp="Wanda Wilson" />

Many-to-many joins are problematic in XML, in part because node values are duplicated. In small examples like these, this duplication may not seem like much of a problem, but in large examples it can significantly affect the size of the result. It can also create problems with node identity, because now the “same” node is copied multiple times to the output.

One way to partially address this problem is to change the join into an outer join. A left outer join includes members from the outer (left) sequence, even when there isn't a member of the inner (right) sequence satisfying the join condition. Similarly, a right outer join keeps items from the inner sequence even when there isn't a corresponding member of the outer sequence. A full outer join keeps items from both sequences.

This technique is illustrated in Listing 6.22, using an outer FLWOR statement to iterate over the first sequence and create one element for each project. An inner FLWOR performs the join, returning zero or more employee names. Separating the join into two FLWOR statements, with the outer one always returning something, is what makes this an outer join.

Example 6.22. An outer join between the team.xml and projects.xml documents

for $proj in doc("projects.xml")/Projects/Project
return
<Assignment proj="{$proj/Name}">{
  for $emp in doc("team.xml")//Employee
  where $emp/Expertise = $proj/Category
  return $emp/Name
}</Assignment>
=>
<Assignment proj="Enter the Tuple Space">
  <Name>Carl Yates</Name>
  <Name>Jim Barry</Name>
</Assignment>
<Assignment proj="Cryptic Code">
  <Name>Chaz Hoover</Name>
  <Name>Jason Abedora</Name>
  <Name>Wanda Wilson</Name>
</Assignment>
<Assignment proj="XQuery Bandit">
  <Name>Carl Yates</Name>
  <Name>Jim Barry</Name>
</Assignment>
<Assignment proj="Micropoly">
  <Name>Wanda Wilson</Name>
</Assignment>

Although the outer join no longer duplicates project names, it still duplicates employee names. In other words, it is a one-to-many join.

The traditional “fix” for representing one-to-many and many-to-many relationships in XML is to select each element exactly once, and to use xs:ID/xs:IDREF values in the XML output to express the relationship. This technique requires that the elements being referred to have unique xs:ID values.

Listing 6.23 demonstrates a FLWOR query that generates a list of employee IDs for each project. This data can then be used with navigation functions like id() to find the corresponding Employee elements, instead of duplicating them in the result. This technique has the drawback that two joins will be performed: one to produce the result, and another to consume it. However, typically implementation optimizes ID lookup so that it's faster than arbitrary joins.

Example 6.23. Use xs:ID relationships instead of duplicating elements

for $proj in doc("projects.xml")/Projects/Project
return
<Assignment proj="{$proj/Name}" emps="{
    for $emp in doc('team.xml')//Employee
    where $emp/Expertise = $proj/Category
    return $emp/@id
  }"/>
=>
<Assignment proj="Enter the Tuple Space" emps="E2 E3" />
<Assignment proj="Cryptic Code"          emps="E6 E5 E3" />
<Assignment proj="XQuery Bandit"         emps="E2 E3" />
<Assignment proj="Micropoly"             emps="E0" />

Not all joins are equi-joins. The query in Listing 6.24 finds all projects for which the owner doesn't have expertise in that project category. (Chapter 11 explains why this query uses not(=) instead of !=.)

Example 6.24. Not all joins are equi-joins

for $proj in doc("projects.xml")/Projects/Project
for $emp in doc("team.xml")//Employee
where $proj/@owner = $emp/@id
  and not($proj/Category = $emp/Expertise)
return <Mismatch>{$emp/Name, $proj/Name}</Mismatch>
=>
<Mismatch>
  <Name>Kandy Konrad</Name>
  <Name>Cryptic Code</Name>
</Mismatch>
<Mismatch>
  <Name>Jason Abedora</Name>
  <Name>XQuery Bandit</Name>
</Mismatch>
<Mismatch>
  <Name>Jim Barry</Name>
  <Name>Micropoly</Name>
</Mismatch>

Another kind of join is the self-join, which joins a sequence with itself. Self-joins are very common when working with recursive data, such as in the team.xml document. In fact, we've already seen one example in Chapter 3 already, copied here as Listing 6.25.

Example 6.25. A self-join using path navigation

doc("team.xml")//Employee[Title =
                   doc("team.xml")//Employee[@id="E0"]/Title]/Name

The meaning of this path may be more easily discerned when it's written as a FLWOR expression, like in Listing 6.26. Both queries do the same work, but the second query using FLWOR makes the join more explicit.

Example 6.26. The same self-join as Listing 6.25 using FLWOR instead

let $emp := doc("team.xml")//Employee
for $i in $emp, $j in $emp
where $i/Title = $j/Title and $j/@id = "E0"
return $i/Name

Comparing Sequences

There are many different ways to compare two sequences: member-wise, existentially, and universally to name just three. This section shows how to perform these different kinds of sequence comparisons. Effectively, each kind of sequence comparison is just another kind of join.

Existential Comparison

In Chapter 5 you learned that the general comparison operators compare sequences existentially. They test whether there exists a member in each sequence such that the comparison is true. For example, to test whether there exists a member of one sequence that is greater than a member of a second sequence, you can write $seq1 > $seq2.

However, the general comparison operators can only perform simple comparisons, and not more complex conditions. Fortunately, XQuery provides many other ways to express existential comparison.

One is to use the some operator. For example, the previous comparison could also have been expressed as some $a in $seq1, $b in $seq2 satisfies $a > $b. However, some is more flexible because it allows you to express other kinds of conditions. For example, suppose that instead of merely comparing one member against another, you want to test whether it's greater by a certain amount (say, 3). Then you could use the query some $a in $seq1, $b in $seq2 satisfies $a > $b + 3.

Path expressions are a simpler but somewhat less expressive way to perform existential comparisons, applying a predicate condition to each node in a step. For example, you could write exists($seq1[. > $seq2]). Like some, paths can express more complex conditions. For instance, you could express the same comparison using some using the path exists($seq1[ . - 3 > $seq2]). Paths are very concise, but can be harder to read and can have unintended consequences. For example, the expression exists($seq1[ . > $seq2 + 3]) results in an error because addition cannot be applied to a sequence of more than one item.

The most general-purpose way to perform an existential comparison is using FLWOR expressions. For example, you could write for $a in $seq1, $b in $seq2 where $a > $seq2 return $a. This not only compares the two sequences, but also returns all the members of the first sequence that satisfied the condition. To test that at least one such member exists, just wrap the FLWOR expression with the exists() function. Listing 6.27 compares these three styles side-by-side.

Example 6.27. Three ways to existentially compare two sequences

some $a in $seq1, $b in $seq2 satisfies $a > $b
$a[. > $seq2]
exists(for $a in $seq1, $b in $seq2 where $a > $b return $a)

Memberwise Comparison

The deep-equal()function compares sequences memberwise. It tests whether the two sequences have the same length, and if so, whether each pair of corresponding members compares true. However, this function always performs deep equality. XQuery doesn't have a built-in function that performs memberwise comparison by node identity, or that performs any other kind of memberwise action.

For instance, suppose that you want to take two sequences of numbers and add or compare them memberwise. The easiest way to express memberwise operations in XQuery is to use a join on position. The idea is to keep track of which position you are at as you iterate through the first sequence, so that you can pick out the member of the second sequence that is at the same location.

For example, the function in Listing 6.28 is similar to deep-equal(), except that it compares nodes by identity instead of using deep equality.

Example 6.28. deep-equal(), without the deep

declare function shallow-equal($seq1 as node()*,
                              $seq2 as node()*) as xs:boolean {
  if (count($seq1) != count($seq2))
  then false()
  else
    empty(
      for $i at $p1 in $seq1
      for $j at $p2 in $seq2
      where $p1 = $p2 and $i is not $j
      return 1
    )
};

For another example, the query in Listing 6.29 returns true if every member of the first sequence is greater than the corresponding member of the second sequence.

Example 6.29. Compare two sequences memberwise

count($seq1) = count($seq2)
and empty(
  for $i at $p1 in $seq1
  for $j at $p2 in $seq2
  where $p1 = $p2 and not($i > $j)
  return 1
)

These examples could also be expressed using predicates to apply an implicit join. For example, Listing 6.30 is another way to achieve the same effect as in Listing 6.29.

Example 6.30. An equivalent way to compare two sequences memberwise

count($seq1) = count($seq2)
and empty(
  for $i at $p in $seq1
  where not($i > $seq2[$p])
  return 1
)

You may notice the similarity between this query and the every operator. Unfortunately, every doesn't provide a way to introduce position variables, but you can compensate for this by using the to operator to range up to the length of the sequence, as in Listing 6.31.

Example 6.31. Yet another way to produce the same effect

count($seq1) = count($seq2)
and
every $p in 1 to count($seq1) satisfies $seq1[$p] > $seq2[$p]

Another consideration when writing these comparisons is the decision as to how to handle exceptional conditions: If one sequence is longer than the other, do you want to return false, raise an error, or ignore the extra members?

Because $seq[$pos] returns the empty sequence when $pos goes beyond the end of the sequence, you could omit the count() comparison to ignore additional members when the sequences don't have the same length, as in Listing 6.32. Alternatively, you can use a conditional and the error() function to raise an error when the sequences differ in length.

Example 6.32. Ignore leftover members

empty(for $i at $p in $seq1 where not($i > $seq2[$p]) return 1)

Universal Comparison

Another way to compare sequences is to require that a condition be true for every member of the sequence. Section 6.3 already showed how to perform universal quantification using the every operator (or an equivalent FLWOR). Like existential quantification, universal quantification can be expressed in several forms, some of which are shown in Listing 6.33.

Example 6.33. Universal quantification, like existential, has many forms

every $a in $seq1, $b in $seq2 satisfies $a gt $b

empty( for $a in $seq1
       where not($a > $seq2)
       return $a )

empty($seq1[not(. > $seq2)])

Another option is what might be called universal memberwise comparison: Require every member to compare true against its corresponding member in the other sequence. As before, you have to decide how to handle the case that the sequences differ in length. The example in Listing 6.34 returns false if the two sequences differ in length. It uses the fact that every pair of members satisfies a condition if no pair of them fails it.

Example 6.34. Universal memberwise comparison

count($seq1) = count($seq2)
and empty( for $i at $pos in $seq1
           where not($i > $seq2[$pos])
           return $i )+

Sorting

FLWOR expressions have a natural ordering, which is determined by the bindings of the for variables (the row order of the tuple space). However, the order by clause may be used to apply some other ordering. The order by clause takes a list of expressions, called sort keys.

The order by clause sorts the tuple space according to each of the sort keys in order. It's important to understand that it's the rows of the tuple space that are sorted, not the final (flattened) result. The sort keys don't have to be returned in the result. Listing 6.35 demonstrates a simple sort.

Example 6.35. The order by clause sorts the result (after the where clause)

for $i in (4, 2, 3, 1)
order by $i
return $i
=>
(1, 2, 3, 4)

Each sort key is first evaluated and atomized; if the result isn't empty or a singleton, then an error is raised. Untyped values are cast to xs:string. A sort key must evaluate to the same type for every tuple in the tuple space, and the type must be totally ordered (i.e., support the gt operator). Sorting depends on the gt value comparison operator, which depends on the collation when comparing xs:string values. The example in Listing 6.36 may produce a different result from the one given here, depending on the default collation.

Example 6.36. Sorting by string values depends on the default collation

for $i in doc("team.xml")//Employee
where exists($i//Employee)
order by $i/@id descending
return $i/Name
=>
<Name>Chaz Hoover</Name>
<Name>Carl Yates</Name>
<Name>Kandy Konrad</Name>

When the first sort key evaluated for two tuples results in a tie, then subsequent sort keys are used to break the tie. If all the keys tie, then the order of the tied tuples is implementation-defined unless the stable keyword is used in front of the order by keyword. In that case, the tied tuples retain their original ordering relative to each other. In Listing 6.37, the first two results tied, so their original relative order is preserved.

Example 6.37. Stable sorting preserves the original order of ties

for $i in (<a b="z">2</a>, <a b="x">1</a>, <a b="y">1</a>)
stable order by $i
return $i
=>
<a b="x">1</a>
<a b="y">1</a>
<a b="z">2</a>

Each sort key expression may be followed by modifiers that control how that key affects the sort order. The most common modifiers are ascending and descending (ascending is the default) with the obvious effect. The other modifiers, empty least and empty greatest, control how the empty sequence and NaN sort relative to all other values.

When empty least is specified, then the empty sequence sorts less than all other non-empty values, and NaN sorts less than all other non-empty and non-NaN values. When empty greatest is specified, then the empty sequence sorts greater than all other non-empty values, and NaN sorts greater than all other non-empty and non-NaN values. Listing 6.38 demonstrates the interaction between empty least and these special values.

Example 6.38. Empty least and greatest affect how NaN and () sort

for $i in (1E0, 2E0, 3E0, 0E0 div 0)
let $key := if ($i < 2.5) then $i else ()
order by $key empty least, $i descending
return $i
=>
(2E0, 1E0, NaN, 3E0)

These modifiers are demonstrated by Listing 6.38, which generates the ordered tuple space illustrated in Table 6.4. Because the first sort key has empty least, the two empty sequences sort least, followed by NaN, followed by 3E0. Then the second key is used to break the tie between the first two values. Because it has descending, 2E0 sorts before 1E0, producing the final result shown.

Table 6.4. The ordered tuple space corresponding to Listing 6.38

$I

$key

return

2E0

()

2E0

1E0

()

1E0

NaN

NaN

NaN

3E0

3E0

3E0

When sorting by string values, it's important to pay attention to Unicode normalization and collation. For example, you may want letters to sort in a case-insensitive way, or you may want alternate spellings of the same phonetic symbol to sort the same way. The former is controlled by collation, while the latter is controlled by Unicode normalization. Support for both of these depends partly on the implementation (see Chapter 8 for details). You can also apply string functions such as lower-case() in the sort key, although this will generally perform less well than using collation.

Grouping

When joining two sequences together, sometimes you also want to sort the result by something that depends on the result. For example, you might want to group cities by region with regions sorted by the sum of their city populations, or list team leads by the number of people or the average years of experience accumulated on their team.

Queries like these follow a grouping pattern, and some languages (such as SQL) have dedicated operators to support them. Other languages, such as XSLT and XQuery, express group queries just like any other query, and then implementations vary in how well they recognize the grouping patterns and execute them.

For example, consider listing team leads in decreasing order by the number of direct reports they have. The query in Listing 6.39 performs this grouping.

Example 6.39. A simple grouping query that sorts team leads by team size

for $i in doc("team.xml")//Employee
let $reports := count($i/Employee)
where $reports > 0
order by $reports descending
return <Employee name="{$i/Name}" reports="{$reports}"/>
=>
<Employee name="Chaz Hoover" reports="2"/>
<Employee name="Carl Yates" reports="2"/>
<Employee name="Kandy Konrad" reports="1"/>

In this query, we first iterate over all Employee elements in the team.xml document. Then, for each employee, we calculate the number of direct reports (that is, the number of Employee element children). We filter the result with a where clause to eliminate employees who have no direct reports. The order by clause reverse sorts by the number of direct reports, and finally the return clause constructs a meaningful result.

Instead, we could reverse sort the list by the average number of years on each team. This is a somewhat more complicated query, in part because we must iterate over all the Employee descendants and calculate their average years of experience. Because the avg() function takes only sequences of atomic values, when we pass it the sequence of year attributes, it first applies the function conversion rules (Chapter 2), which atomizes this list of nodes and produces a sequence of values. We use the descendant-or-self axis directly instead of the abbreviation //, so that we include the lead's years of experience in the average. The round-half-to-even() function is used to round the result to two decimal places.

Example 6.40. Team leads reverse sorted by average years of experience on their team

for $i in doc("team.xml")//Employee
let $years := avg($i/descendant-or-self::Employee/@years)
where exists($i/Employee)
order by $years descending
return <Employee name="{$i/Name}"
                 years="{round-half-to-even($years, 2)}"/>
=>
<Employee name="Chaz Hoover" years="4.82"/>
<Employee name="Carl Yates" years="2.63"/>
<Employee name="Kandy Konrad" years="8.35"/>

Conceptually, this query produces the nested tuple space depicted in Table 6.5. In the interest of space, this table shows only the employee ID values instead of the entire Employee nodes. The $years column requires an implicit join between each Employee and its descendants, and then computes the average, which is listed in the order by column. In practice, some implementations may optimize queries like these to compute the aggregate on the fly.

Table 6.5. The tuple space corresponding to Listing 6.40

$i/@id

$years

Where

order by

return

0E6

The tuple space corresponding to Listing 6.40

true

4.8166666

<Employee name="Chaz Hoover" years="4.82"/>

E2

The tuple space corresponding to Listing 6.40

true

2.6333333

<Employee name="Carl Yates" years="2.63"/>

E4

The tuple space corresponding to Listing 6.40

false

  

E5

The tuple space corresponding to Listing 6.40

false

  

E1

The tuple space corresponding to Listing 6.40

true

8.35

<Employee name="Kandy Konrad" years="8.35"/>

E0

The tuple space corresponding to Listing 6.40

false

  

E3

The tuple space corresponding to Listing 6.40

false

  

It's important to note also that the order by and return clauses aren't evaluated when the where clause is false (avoiding, for example, potential error cases).

You might also like to generate a nice XHTML table containing the results, with two employees per line. In this case, you want to group the results by position but after they have been sorted as above. This can be accomplished by applying another FLWOR query to the original one, like in Listing 6.41.

Example 6.41. Generating a table of the results, grouped into pairs

<table>
  <tr>
    <th>Employee</th><th>years</th>
    <th>Employee</th><th>years</th>
  </tr>
{
  let $emps :=
    (for $i in doc("team.xml")//Employee
     let $years := avg($i/descendant-or-self::Employee/@years)
     where exists($i/Employee)
     order by $years descending
     return <Employee name="{$i/Name}"
                      years="{round-half-to-even($years, 2)}"/>)
  for $e at $pos in $emps
  let $next := $emps[$pos + 1]
  where $pos mod 2 = 0
  return
  <tr>
    <td>{string($e/@Name)}</td><td>{string($e/@years)}</td>
    <td>{string($next/@Name)}</td><td>{string($next/@years)}</td>
  </tr>
}</table>

Conclusion

The XQuery FLWOR expression introduces variables using let, iterates over sequences using for, filters and joins sequences using where, sorts the remaining items using order by, and constructs a result using return. FLWOR is the central expression of most queries, and the most powerful.

The some and every expressions are simplified FLWOR statements that test whether some or every member of a sequence (or sequences) satisfy some condition. They can be convenient when trying to write easy-to-read queries.

Further Reading

Every XQuery developer should own a copy of An Introduction to Database Systems by C. J. Date, currently in its 8th edition. Although focused on the relational data model, this classic contains much useful information on joins, grouping, and data in general.

The FLWOR expression comes directly from Quilt, which was a precursor to XQuery. For a historical perspective on Quilt, check out the Web site at http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.135.224