Most programming languages adopt one of two styles of expressions: iteration or application. Iterative expressions explicitly loop through sequences one member at a time. Applicative expressions apply operations to entire sequences at once (implicitly iterating through the sequence and applying the operation to each member).
As we saw in Chapter 3, navigation paths have an applicative style—each step implicitly iterates over the nodes selected by the previous step. However, the rest of XQuery has adopted an iterative style of programming. This chapter explores the XQuery expressions that perform iteration.
The FLWOR (pronounced “flower”) expression is the most powerful of these, capable of introducing variables, filtering sequences, sorting and grouping, iterating over sequences to produce some result, and joining different data sources, all at once.
XQuery also provides iteration expressions for existential (some
) and universal (every
) quantification, which test whether a condition holds for any or every member of a sequence, respectively.
As its name suggests, FLWOR consists of five major clauses: for
, let
, where
, order by
, and return
. Every FLWOR begins with one or more for
and/or let
clauses (in any order), followed by an optional where
clause, an optional order by
clause, and finally one return
clause. The five clauses are demonstrated by the example in Listing 6.1.
Example 6.1. FLWOR consists of five clauses
for $i at $pos in doc("puzzles.xml")//puzzle let $j := doc("games.xml")//game where $i/@name = $j/@name order by @price descending return $i
Before I explain how FLWOR works, let's first compare FLWOR to equivalent expressions in the popular SQL and XSLT languages. The XQuery FLWOR expression is very similar to SQL SELECT
statements and XSLT <xsl:for-each/>
loops. If you are already familiar SQL or XSLT, then you may find it helpful to think of FLWOR in terms of the equivalent expressions from these languages. If you're not already familiar with SQL or XSLT, you may want to skip ahead to Section 6.2.3.
The similarities between SQL and XQuery are striking. Both XQuery and SQL have a where
clause, which filters the overall expression according to a boolean condition. Both XQuery and SQL have an order by
clause, which sorts the selection according to some criteria. Both languages also can declare variables and assign them to values. Listing 6.2 shows a SQL query similar to FLWOR.
Example 6.2. SQL queries also contain five clauses
DECLARE @Variable1, @Variable2, ... SELECT Column1, Column2, ... FROM Table1, Table2, ... WHERE ... ORDER BY ...
Where the languages differ—aside from the fact that SQL is generally case-insensitive while XQuery is case-sensitive—are the ways in which they iterate over data and construct results.
In SQL, all sequences are flat. Consequently, a simple FROM
clause naming tables (or views, or subqueries) suffices to describe the data sources, and a simple SELECT
clause suffices to list the values to be selected from these sources. SQL is nonpositional, so you can be vague about the order of iteration over these data sources.
In XQuery, the sequences may be flat or they may be hierarchical, and they are almost always ordered. Consequently, the for
clause may simply name collections like in SQL), or it may describe paths or other more complex expressions that navigate these collections, and the order in which the iteration is performed is significant. Similarly, the return
clause doesn't simply return values from these data sources, but may construct entirely new shapes (which are also ordered).
The similarities between SQL and XQuery aren't accidental. SQL has been a highly successful language for data processing, and its features have evolved to fulfill common needs. XQuery has these same needs and more, and therefore provides functionality that includes and surpasses that of SQL.
The correspondence between XSLT and XQuery may be less obvious, but they are actually more closely related than SQL and XQuery, because both XSLT and XQuery both work with hierarchical, ordered data sources. In fact, many people have questioned the need for XQuery at all, when XSLT 1.0 can express so many of the same concepts.
The selection criteria of the <xsl:for-each>
loop is similar to the for
clause of FLWOR. The main difference is that a single FLWOR expression may have any number of for
clauses, but you have to nest <xsl:for-each>
loops to achieve the same effect. Variable assignment in XSLT is performed using <xsl:variable>
and is similar to the XQuery let
clause.
The where
clause of XQuery can be achieved in XSLT using either <xsl:if>
or else predicates on the select expression. The order by
clause of XQuery loosely corresponds to the <xsl:sort>
expression of XSLT, and the return
clause in a FLWOR expression corresponds to the body of the <xsl:for-each>
loop. Listing 6.3 shows an XSLT expression similar to the XQuery FLWOR expression.
Example 6.3. XSLT for-each
loops are similar to FLWOR
<xsl:for-each select="$sequence"> <xsl:variable name="i" value="./@attr"/> <xsl:sort /> <return/>result here</return> </xsl:for-each>
The most obvious difference between XSLT and XQuery is that XSLT is expressed using XML while XQuery is not. Other less obvious but even more significant differences are that XQuery allows sequences of atomic values, has a richer type system, and allows composition (iterating over the results of other expressions). It is also somewhat easier to join two data sources in XQuery than in XSLT, or to express certain kinds of grouping operations.
As with SQL, the similarities between XSLT and XQuery aren't accidental. Both languages must navigate and construct XML, and must handle all of its nuances. In many ways, XQuery is a refinement of the ideas that preceded it.
The FLWOR expression is one of several XQuery expressions that introduces variable assignments. FLWOR actually introduces three kinds of variable bindings.
One kind of variable assignment in FLWOR occurs with the let
clause. In fact, the simplest possible FLWOR expression consists of a single let
clause followed by a return
clause (see Listing 6.4).
Example 6.4. The simplest FLWOR expression consists of let and return
let $variable := "value" return $variable
The let
keyword is followed by a variable ($name
), optionally the variable's type, the assignment operator (:=
), and any XQuery expression at all (even another FLWOR).
The effect is to introduce the variable named by the let
clause into scope and assign it to the given expression (the binding). As shown in Listing 6.5, each variable is in scope for all the rest of the FLWOR expression—that is, the where
, order by
, and return
clauses, and any other variables introduced after it. (However, a variable cannot refer to itself in its own binding.)
Example 6.5. Variables are in scope for all the rest of the FLWOR
let $x := 1 let $y := $x where $x > 0 or $y < 0 order by $x, $y return ($x, $y)
Variables cannot be reassigned once declared, and cannot be redeclared in the same scope.
The other two kinds of variable assignment in FLWOR are iteration and position; both are introduced using the for
clause.
The for
clause takes a variable, an optional type declaration, optionally the at
keyword followed by a position variable, the in
keyword (instead of assignment), and finally the XQuery expression to iterate through. Listing 6.6 demonstrates all of these parts.
The effect is that the variable, and the optional position variable if used, are both in scope for the remainder of the FLWOR expression. These variables change in value during the execution of the FLWOR statement; the variable is assigned to each item in the sequence in turn, and the position variable is assigned to the current position within this sequence.
Example 6.6. The for
clause iterates through a sequence (optionally with position)
for $item at $pos in $sequence return <x index="{$pos}">{ $item }</x>
Each FLWOR expression may contain any number of variable bindings, in any order, as long as there is at least one let
or for
clause. XQuery also allows you to introduce several variables in a single clause, by separating each variable introduction with commas. Listing 6.7 demonstrates both variable styles.
Example 6.7. FLWOR expressions may introduce many variables
for $i in doc("one.xml")//foo, $j in $i//foo let $k := $i//bar, $m := $j//bar for $n in doc("two.xml")//foo where count($k) > count($n//bar) and count($n//bar) > count($m) return $n
The order in which these variables and clauses appear in the FLWOR expression is significant, as explained next.
The tuple space is essential to understanding how FLWOR works. A tuple space is like a matrix: Each row is a tuple, and each row in the tuple space has the same number of columns as all the others. Tuples are similar to ordinary sequences, except that they may themselves contain sequences (i.e., tuples are nested).
Tuples and tuple spaces aren't formally part of the XQuery Data Model, but the FLWOR operator is defined (and best explained) in terms of them. During evaluation, each FLWOR expression corresponds to one tuple space.
Every variable introduced by for
and let
clauses generates another column in the tuple space. When visualizing tuple spaces, I also like to add columns for each of the where
and return
clauses, as well as each expression in the order by
clause. For example, the FLWOR expression in Listing 6.8 corresponds to a tuple space with six columns: one each for the $i
, $pos
, and $j
variables introduced by for
and let
, one for the where
clause, one for the order by
expression, and one for the return
clause.
Example 6.8. A sample FLWOR
for $i at $pos in reverse (1 to 10) let $j := $i * $i where $j <= 50 order by $i return ($pos, $i)
Determining the number of rows in the tuple space is somewhat more complicated. If there are only let
clauses (no for
clause), then there is one row in the tuple space. Otherwise, count the lengths of the sequences over which each for
variable iterates, and multiply them together to get the number of rows.
In the example given in Listing 6.8, there are ten rows in the corresponding tuple space, because there is one for
variable and it iterates over a sequence of length 10. The tuple space for this FLWOR expression is shown in Table 6.1. Notice that the last column contains a nested sequence.
Table 6.1. The tuple space corresponding to Listing 6.8
$i | $pos | $j | where | order by | return |
---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Only rows for which the where
clause is true are kept in the final result. So in this example, the first three rows will be omitted.
Next, the rows are sorted according to the order by clause. By default, the order by
clause sorts in ascending order (you'll see other kinds of sorting in Section 6.6). In this example, sorting by the order by
column reverses the order of the rows.
Finally, after filtering by the where
clause and sorting by the order by
clause, the return
clauses are concatenated together in order (from top to bottom) into one flat sequence. Listing 6.9 shows the result of executing this FLWOR expression.
Example 6.9. The result of executing the FLWOR expression in Listing 6.8
(10, 1, 9, 2, 8, 3, 7, 4, 6, 5, 5, 6, 4, 7)
Visualizing tuple spaces becomes slightly more difficult when there are multiple for
variables. Each subsequent for
variable splits the existing rows. Consider, for example, Listing 6.10. This query constructs a tuple space with four columns and eight rows (the length of the first sequence times the length of the second sequence).
Example 6.10. A FLWOR expression that joins two sequences
for $x at $p in ("a", "b") for $y in (1, 2, 3, 4) where $p * $y > 5 return ($x, $y)
Looking just at the first variable, we would get two rows. Now split each of those rows into four rows (one row for each item in the second sequence), to get the total of eight rows shown in Table 6.2. Only the last two rows satisfy the where
clause, so the final result is the sequence ("b", 3, "b", 4)
.
Table 6.2. The tuple space corresponding to Listing 6.10
$x | $p | $y | where | return |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When the return
clause or one of the variable bindings is itself a FLWOR expression, then the tuple spaces are nested. For example, the query in Listing 6.11 produces the nested tuple space shown in Table 6.3, because the first return
clause itself returns a FLWOR.
As in the non-nested case, the return clauses are concatenated together in order (after filtering and sorting). The result of executing this query is the sequence (8, 9, 10, 10, 11, 12)
.
Before learning how to accomplish practical tasks such as joins and grouping using FLWOR, let's first discuss two related operators, some
and every
. These operators are like “mini-FLWORs” containing only for
and where
clauses, except that instead of for
, these use the keywords some
and every
, and instead of returning a sequence of values, they return a single boolean value.
Both operators are followed by a variable name, the keyword in
, the sequence to iterate over, the keyword satisfies
, and finally the condition to be tested for each member of the sequence. The condition is implicitly converted to boolean using Effective Boolean Value. Both operators are illustrated in Listing 6.12.
Example 6.12. The some
and every
operators
some $emp in doc("team.xml")//Employee satisfies $emp/@years > 5 every $emp in doc("temp.xml")//Employee satisfies $emp/@years > 5
Like the for
clause in FLWOR, variables introduced this way are in scope for the rest of the expression (in this case, the condition), and additional variables may be introduced by separating them with commas as in Listing 6.13.
Example 6.13. Quantification may introduce several variables at once
some $i in (1, 2), $j in (1, 3) satisfies $i = $j
The some
operator implements what is known as existential quantification. This is just a fancy term meaning that it tests whether there exists a combination of variable bindings such that the condition is true. Similarly, the every
operator implements universal quantification. It tests whether every combination of variable bindings satisfies the condition.
For example, the query in Listing 6.13 returns true, because there exists an assignment to $i
(1
) and an assignment to $j
(1
), such that the two are equal. However, the same query using every
instead of some
would return false, because then there exists an assignment to $i
(1
) and an assignment to $j
(3
) such that the condition doesn't hold.
These operators are entirely redundant with FLWOR, but they may allow you to express some queries using a more natural syntax (especially every
, which is somewhat awkward when written as FLWOR). These equivalences are depicted in Listing 6.14.
The primary use of the FLWOR expression is to join one or more sequences together (or with themselves) to produce some result. The result may be drawn from the members of the sequences, like database joins, or constructed new in the query.
Joins express relationships between sequences. When two unrelated, independent sequences are combined, then we get a cross-product, the simplest kind of join. The XQuery in Listing 6.15 creates a cross-product of two sequences.
Example 6.15. The cross-product of two sequences
for $i in (1, 2, 3) for $j in (3, 4, 5) return ($i, $j) => (1, 3, 1, 4, 1, 5, 2, 3, 2, 4, 2, 5, 3, 3, 3, 4, 3, 5)
More commonly, however, the join eliminates some items in the cross-product from consideration. For example, you might select only those members from both sets that are equal, like in Listing 6.16. This is called an inner join, because it includes only values that are in both sequences. (This particular example is also an equi-join, because it uses the equality operator. Most implementations specially optimize equi-joins.)
Example 6.16. An inner join of two sequences
for $i in (1, 2, 3) for $j in (3, 4, 5) where $i = $j return ($i, $j) => (3, 3)
A more realistic example would join two XML documents. So let's recall the team.xml
document from Chapter 1 (Listing 1.26), and consider a second XML source, projects.xml
, listed in Listing 6.17.
Example 6.17. The projects.xml document
<?xml version='1.0'?> <Projects> <Project id="X1" owner="E2"> <Name>Enter the Tuple Space</Name> <Category>Video Games</Category> </Project> <Project id="X2" owner="E1"> <Name>Cryptic Code</Name> <Category>Puzzles</Category> </Project> <Project id="X3" owner="E5"> <Name>XQuery Bandit</Name> <Category>Video Games</Category> </Project> <Project id="X4" owner="E3"> <Name>Micropoly</Name> <Category>Board Games</Category> </Project> </Projects>
A simple inner join between these documents is demonstrated in Listing 6.18. This query iterates through the Project
elements from projects.xml
and the Employee
elements from team.xml
and finds all pairs such that the project's owner attribute is equal to the employee's id attribute. This kind of join result is called a one-to-one join, because each item in the first sequence has (at most) one corresponding item in the second sequence.
To enhance readability, I've reformatted the result in this example and all the others in this chapter with extra whitespace (by using new lines and sometimes indentation).
Example 6.18. A one-to-one join between the projects.xml and team.xml documents
for $proj in doc("projects.xml")/Projects/Project for $emp in doc("team.xml")//Employee where $proj/@owner = $emp/@id return $proj/Name, $emp/Name => <Name>Enter the Tuple Space</Name> <Name>Carl Yates</Name> <Name>Cryptic Code</Name> <Name>Kandy Konrad</Name> <Name>XQuery Bandit</Name> <Name>Jason Abedora</Name> <Name>Micropoly</Name> <Name>Jim Barry</Name>
There are several interesting directions to go from here, such as changing the join condition or the result format. Let's consider changing the result first, and then explore other kinds of join conditions.
In this example, the output is somewhat confusing because the natural pairing of employees and projects has been lost when the tuple space result was flattened (remember, sequences in XQuery aren't nested). To achieve a hierarchical result, we need to use XML. In Listing 6.19, we wrap each pair with a new Assignment
element. The curly braces inside the element cause its contents to be evaluated as XQuery expressions. (See Chapter 7 for details about XML construction.)
Again, whitespace has been inserted to make the results easier to read.
Example 6.19. The same join as Listing 6.18, but using XML to group results
for $proj in doc("projects.xml")/Projects/Project for $emp in doc("team.xml")//Employee where $proj/@owner = $emp/@id return <Assignment>{ $proj/Name, $emp/Name }</Assignment> => <Assignment> <Name>Enter the Tuple Space</Name> <Name>Carl Yates</Name> </Assignment> <Assignment> <Name>Cryptic Code</Name> <Name>Kandy Konrad</Name> </Assignment> <Assignment> <Name>XQuery Bandit</Name> <Name>Jason Abedora</Name> </Assignment> <Assignment> <Name>Micropoly</Name> <Name>Jim Barry</Name> </Assignment>
Another confusing aspect about the result is that there are two Name
elements with different meanings. Conflicting node names (and just a general desire to change names) are common when joining XML data sources together. Listing 6.20 uses the XML constructor syntax to create attributes instead of copying the elements from the original document. (See Chapter 7 for more examples of XML constructors.)
Example 6.20. The same join as Listings 6.18 and 6.19, but reshaping the results
for $proj in doc("projects.xml")/Projects/Project for $emp in doc("team.xml")//Employee where $proj/@owner = $emp/@id return <Assignment proj="{$proj/Name}" emp="{$emp/Name}" /> => <Assignment proj="Enter the Tuple Space" emp="Carl Yates" /> <Assignment proj="Cryptic Code" emp="Kandy Konrad" /> <Assignment proj="XQuery Bandit" emp="Jason Abedora" /> <Assignment proj="Micropoly" emp="Jim Barry" />
Other kinds of joins are possible. For example, Listing 6.21 shows a many-to-many join, so named because many items from the first sequence may correspond to many items from the second sequence.
Example 6.21. A many-to-many join between team.xml and projects.xml
for $proj in doc("projects.xml")/Projects/Project for $emp in doc("team.xml")//Employee where $proj/Category = $emp/Expertise return <Assignment proj="{$proj/Name}" emp="{$emp/Name}" /> => <Assignment proj="Enter the Tuple Space" emp="Carl Yates" /> <Assignment proj="Enter the Tuple Space" emp="Jim Barry" /> <Assignment proj="Cryptic Code" emp="Chaz Hoover" /> <Assignment proj="Cryptic Code" emp="Jason Abedora" /> <Assignment proj="Cryptic Code" emp="Wanda Wilson" /> <Assignment proj="XQuery Bandit" emp="Carl Yates" /> <Assignment proj="XQuery Bandit" emp="Jim Barry" /> <Assignment proj="Micropoly" emp="Wanda Wilson" />
Many-to-many joins are problematic in XML, in part because node values are duplicated. In small examples like these, this duplication may not seem like much of a problem, but in large examples it can significantly affect the size of the result. It can also create problems with node identity, because now the “same” node is copied multiple times to the output.
One way to partially address this problem is to change the join into an outer join. A left outer join includes members from the outer (left) sequence, even when there isn't a member of the inner (right) sequence satisfying the join condition. Similarly, a right outer join keeps items from the inner sequence even when there isn't a corresponding member of the outer sequence. A full outer join keeps items from both sequences.
This technique is illustrated in Listing 6.22, using an outer FLWOR statement to iterate over the first sequence and create one element for each project. An inner FLWOR performs the join, returning zero or more employee names. Separating the join into two FLWOR statements, with the outer one always returning something, is what makes this an outer join.
Example 6.22. An outer join between the team.xml and projects.xml documents
for $proj in doc("projects.xml")/Projects/Project return <Assignment proj="{$proj/Name}">{ for $emp in doc("team.xml")//Employee where $emp/Expertise = $proj/Category return $emp/Name }</Assignment> => <Assignment proj="Enter the Tuple Space"> <Name>Carl Yates</Name> <Name>Jim Barry</Name> </Assignment> <Assignment proj="Cryptic Code"> <Name>Chaz Hoover</Name> <Name>Jason Abedora</Name> <Name>Wanda Wilson</Name> </Assignment> <Assignment proj="XQuery Bandit"> <Name>Carl Yates</Name> <Name>Jim Barry</Name> </Assignment> <Assignment proj="Micropoly"> <Name>Wanda Wilson</Name> </Assignment>
Although the outer join no longer duplicates project names, it still duplicates employee names. In other words, it is a one-to-many join.
The traditional “fix” for representing one-to-many and many-to-many relationships in XML is to select each element exactly once, and to use xs:ID
/xs:IDREF
values in the XML output to express the relationship. This technique requires that the elements being referred to have unique xs:ID
values.
Listing 6.23 demonstrates a FLWOR query that generates a list of employee IDs for each project. This data can then be used with navigation functions like id()
to find the corresponding Employee
elements, instead of duplicating them in the result. This technique has the drawback that two joins will be performed: one to produce the result, and another to consume it. However, typically implementation optimizes ID lookup so that it's faster than arbitrary joins.
Example 6.23. Use xs:ID
relationships instead of duplicating elements
for $proj in doc("projects.xml")/Projects/Project return <Assignment proj="{$proj/Name}" emps="{ for $emp in doc('team.xml')//Employee where $emp/Expertise = $proj/Category return $emp/@id }"/> => <Assignment proj="Enter the Tuple Space" emps="E2 E3" /> <Assignment proj="Cryptic Code" emps="E6 E5 E3" /> <Assignment proj="XQuery Bandit" emps="E2 E3" /> <Assignment proj="Micropoly" emps="E0" />
Not all joins are equi-joins. The query in Listing 6.24 finds all projects for which the owner doesn't have expertise in that project category. (Chapter 11 explains why this query uses not(=)
instead of !=
.)
Example 6.24. Not all joins are equi-joins
for $proj in doc("projects.xml")/Projects/Project for $emp in doc("team.xml")//Employee where $proj/@owner = $emp/@id and not($proj/Category = $emp/Expertise) return <Mismatch>{$emp/Name, $proj/Name}</Mismatch> => <Mismatch> <Name>Kandy Konrad</Name> <Name>Cryptic Code</Name> </Mismatch> <Mismatch> <Name>Jason Abedora</Name> <Name>XQuery Bandit</Name> </Mismatch> <Mismatch> <Name>Jim Barry</Name> <Name>Micropoly</Name> </Mismatch>
Another kind of join is the self-join, which joins a sequence with itself. Self-joins are very common when working with recursive data, such as in the team.xml
document. In fact, we've already seen one example in Chapter 3 already, copied here as Listing 6.25.
Example 6.25. A self-join using path navigation
doc("team.xml")//Employee[Title = doc("team.xml")//Employee[@id="E0"]/Title]/Name
The meaning of this path may be more easily discerned when it's written as a FLWOR expression, like in Listing 6.26. Both queries do the same work, but the second query using FLWOR makes the join more explicit.
Example 6.26. The same self-join as Listing 6.25 using FLWOR instead
let $emp := doc("team.xml")//Employee for $i in $emp, $j in $emp where $i/Title = $j/Title and $j/@id = "E0" return $i/Name
There are many different ways to compare two sequences: member-wise, existentially, and universally to name just three. This section shows how to perform these different kinds of sequence comparisons. Effectively, each kind of sequence comparison is just another kind of join.
In Chapter 5 you learned that the general comparison operators compare sequences existentially. They test whether there exists a member in each sequence such that the comparison is true. For example, to test whether there exists a member of one sequence that is greater than a member of a second sequence, you can write $seq1 > $seq2
.
However, the general comparison operators can only perform simple comparisons, and not more complex conditions. Fortunately, XQuery provides many other ways to express existential comparison.
One is to use the some
operator. For example, the previous comparison could also have been expressed as some $a in $seq1, $b in $seq2 satisfies $a > $b
. However, some
is more flexible because it allows you to express other kinds of conditions. For example, suppose that instead of merely comparing one member against another, you want to test whether it's greater by a certain amount (say, 3). Then you could use the query some $a in $seq1, $b in $seq2 satisfies $a > $b + 3
.
Path expressions are a simpler but somewhat less expressive way to perform existential comparisons, applying a predicate condition to each node in a step. For example, you could write exists($seq1[. > $seq2])
. Like some
, paths can express more complex conditions. For instance, you could express the same comparison using some
using the path exists($seq1[ . - 3 > $seq2])
. Paths are very concise, but can be harder to read and can have unintended consequences. For example, the expression exists($seq1[ . > $seq2 + 3])
results in an error because addition cannot be applied to a sequence of more than one item.
The most general-purpose way to perform an existential comparison is using FLWOR expressions. For example, you could write for $a in $seq1, $b in $seq2 where $a > $seq2 return $a
. This not only compares the two sequences, but also returns all the members of the first sequence that satisfied the condition. To test that at least one such member exists, just wrap the FLWOR expression with the exists()
function. Listing 6.27 compares these three styles side-by-side.
The deep-equal()
function compares sequences memberwise. It tests whether the two sequences have the same length, and if so, whether each pair of corresponding members compares true. However, this function always performs deep equality. XQuery doesn't have a built-in function that performs memberwise comparison by node identity, or that performs any other kind of memberwise action.
For instance, suppose that you want to take two sequences of numbers and add or compare them memberwise. The easiest way to express memberwise operations in XQuery is to use a join on position. The idea is to keep track of which position you are at as you iterate through the first sequence, so that you can pick out the member of the second sequence that is at the same location.
For example, the function in Listing 6.28 is similar to deep-equal()
, except that it compares nodes by identity instead of using deep equality.
Example 6.28. deep-equal()
, without the deep
declare function shallow-equal($seq1 as node()*, $seq2 as node()*) as xs:boolean { if (count($seq1) != count($seq2)) then false() else empty( for $i at $p1 in $seq1 for $j at $p2 in $seq2 where $p1 = $p2 and $i is not $j return 1 ) };
For another example, the query in Listing 6.29 returns true if every member of the first sequence is greater than the corresponding member of the second sequence.
Example 6.29. Compare two sequences memberwise
count($seq1) = count($seq2) and empty( for $i at $p1 in $seq1 for $j at $p2 in $seq2 where $p1 = $p2 and not($i > $j) return 1 )
These examples could also be expressed using predicates to apply an implicit join. For example, Listing 6.30 is another way to achieve the same effect as in Listing 6.29.
Example 6.30. An equivalent way to compare two sequences memberwise
count($seq1) = count($seq2) and empty( for $i at $p in $seq1 where not($i > $seq2[$p]) return 1 )
You may notice the similarity between this query and the every
operator. Unfortunately, every
doesn't provide a way to introduce position variables, but you can compensate for this by using the to
operator to range up to the length of the sequence, as in Listing 6.31.
Example 6.31. Yet another way to produce the same effect
count($seq1) = count($seq2) and every $p in 1 to count($seq1) satisfies $seq1[$p] > $seq2[$p]
Another consideration when writing these comparisons is the decision as to how to handle exceptional conditions: If one sequence is longer than the other, do you want to return false, raise an error, or ignore the extra members?
Because $seq[$pos]
returns the empty sequence when $pos
goes beyond the end of the sequence, you could omit the count()
comparison to ignore additional members when the sequences don't have the same length, as in Listing 6.32. Alternatively, you can use a conditional and the error()
function to raise an error when the sequences differ in length.
Another way to compare sequences is to require that a condition be true for every member of the sequence. Section 6.3 already showed how to perform universal quantification using the every
operator (or an equivalent FLWOR). Like existential quantification, universal quantification can be expressed in several forms, some of which are shown in Listing 6.33.
Example 6.33. Universal quantification, like existential, has many forms
every $a in $seq1, $b in $seq2 satisfies $a gt $b empty( for $a in $seq1 where not($a > $seq2) return $a ) empty($seq1[not(. > $seq2)])
Another option is what might be called universal memberwise comparison: Require every member to compare true against its corresponding member in the other sequence. As before, you have to decide how to handle the case that the sequences differ in length. The example in Listing 6.34 returns false if the two sequences differ in length. It uses the fact that every pair of members satisfies a condition if no pair of them fails it.
FLWOR expressions have a natural ordering, which is determined by the bindings of the for
variables (the row order of the tuple space). However, the order by
clause may be used to apply some other ordering. The order by
clause takes a list of expressions, called sort keys.
The order by
clause sorts the tuple space according to each of the sort keys in order. It's important to understand that it's the rows of the tuple space that are sorted, not the final (flattened) result. The sort keys don't have to be returned in the result. Listing 6.35 demonstrates a simple sort.
Example 6.35. The order by
clause sorts the result (after the where
clause)
for $i in (4, 2, 3, 1) order by $i return $i => (1, 2, 3, 4)
Each sort key is first evaluated and atomized; if the result isn't empty or a singleton, then an error is raised. Untyped values are cast to xs:string
. A sort key must evaluate to the same type for every tuple in the tuple space, and the type must be totally ordered (i.e., support the gt
operator). Sorting depends on the gt
value comparison operator, which depends on the collation when comparing xs:string
values. The example in Listing 6.36 may produce a different result from the one given here, depending on the default collation.
Example 6.36. Sorting by string values depends on the default collation
for $i in doc("team.xml")//Employee where exists($i//Employee) order by $i/@id descending return $i/Name => <Name>Chaz Hoover</Name> <Name>Carl Yates</Name> <Name>Kandy Konrad</Name>
When the first sort key evaluated for two tuples results in a tie, then subsequent sort keys are used to break the tie. If all the keys tie, then the order of the tied tuples is implementation-defined unless the stable
keyword is used in front of the order by
keyword. In that case, the tied tuples retain their original ordering relative to each other. In Listing 6.37, the first two results tied, so their original relative order is preserved.
Example 6.37. Stable sorting preserves the original order of ties
for $i in (<a b="z">2</a>, <a b="x">1</a>, <a b="y">1</a>) stable order by $i return $i => <a b="x">1</a> <a b="y">1</a> <a b="z">2</a>
Each sort key expression may be followed by modifiers that control how that key affects the sort order. The most common modifiers are ascending and descending (ascending is the default) with the obvious effect. The other modifiers, empty least
and empty greatest
, control how the empty sequence and NaN
sort relative to all other values.
When empty least
is specified, then the empty sequence sorts less than all other non-empty values, and NaN
sorts less than all other non-empty and non-NaN
values. When empty greatest
is specified, then the empty sequence sorts greater than all other non-empty values, and NaN
sorts greater than all other non-empty and non-NaN
values. Listing 6.38 demonstrates the interaction between empty least
and these special values.
Example 6.38. Empty least and greatest affect how NaN
and ()
sort
for $i in (1E0, 2E0, 3E0, 0E0 div 0) let $key := if ($i < 2.5) then $i else () order by $key empty least, $i descending return $i => (2E0, 1E0, NaN, 3E0)
These modifiers are demonstrated by Listing 6.38, which generates the ordered tuple space illustrated in Table 6.4. Because the first sort key has empty least
, the two empty sequences sort least, followed by NaN
, followed by 3E0
. Then the second key is used to break the tie between the first two values. Because it has descending
, 2E0
sorts before 1E0
, producing the final result shown.
Table 6.4. The ordered tuple space corresponding to Listing 6.38
$I | $key | return |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
When sorting by string values, it's important to pay attention to Unicode normalization and collation. For example, you may want letters to sort in a case-insensitive way, or you may want alternate spellings of the same phonetic symbol to sort the same way. The former is controlled by collation, while the latter is controlled by Unicode normalization. Support for both of these depends partly on the implementation (see Chapter 8 for details). You can also apply string functions such as lower-case()
in the sort key, although this will generally perform less well than using collation.
When joining two sequences together, sometimes you also want to sort the result by something that depends on the result. For example, you might want to group cities by region with regions sorted by the sum of their city populations, or list team leads by the number of people or the average years of experience accumulated on their team.
Queries like these follow a grouping pattern, and some languages (such as SQL) have dedicated operators to support them. Other languages, such as XSLT and XQuery, express group queries just like any other query, and then implementations vary in how well they recognize the grouping patterns and execute them.
For example, consider listing team leads in decreasing order by the number of direct reports they have. The query in Listing 6.39 performs this grouping.
Example 6.39. A simple grouping query that sorts team leads by team size
for $i in doc("team.xml")//Employee let $reports := count($i/Employee) where $reports > 0 order by $reports descending return <Employee name="{$i/Name}" reports="{$reports}"/> => <Employee name="Chaz Hoover" reports="2"/> <Employee name="Carl Yates" reports="2"/> <Employee name="Kandy Konrad" reports="1"/>
In this query, we first iterate over all Employee
elements in the team.xml
document. Then, for each employee, we calculate the number of direct reports (that is, the number of Employee
element children). We filter the result with a where
clause to eliminate employees who have no direct reports. The order by
clause reverse sorts by the number of direct reports, and finally the return
clause constructs a meaningful result.
Instead, we could reverse sort the list by the average number of years on each team. This is a somewhat more complicated query, in part because we must iterate over all the Employee
descendants and calculate their average years of experience. Because the avg()
function takes only sequences of atomic values, when we pass it the sequence of year attributes, it first applies the function conversion rules (Chapter 2), which atomizes this list of nodes and produces a sequence of values. We use the descendant-or-self
axis directly instead of the abbreviation //
, so that we include the lead's years of experience in the average. The round-half-to-even()
function is used to round the result to two decimal places.
Example 6.40. Team leads reverse sorted by average years of experience on their team
for $i in doc("team.xml")//Employee let $years := avg($i/descendant-or-self::Employee/@years) where exists($i/Employee) order by $years descending return <Employee name="{$i/Name}" years="{round-half-to-even($years, 2)}"/> => <Employee name="Chaz Hoover" years="4.82"/> <Employee name="Carl Yates" years="2.63"/> <Employee name="Kandy Konrad" years="8.35"/>
Conceptually, this query produces the nested tuple space depicted in Table 6.5. In the interest of space, this table shows only the employee ID values instead of the entire Employee
nodes. The $years
column requires an implicit join between each Employee
and its descendants, and then computes the average, which is listed in the order by
column. In practice, some implementations may optimize queries like these to compute the aggregate on the fly.
Table 6.5. The tuple space corresponding to Listing 6.40
$i/@id | $years | Where | order by | return |
---|---|---|---|---|
|
|
|
<Employee name="Chaz Hoover" years="4.82"/> | |
|
|
|
<Employee name="Carl Yates" years="2.63"/> | |
|
| |||
|
| |||
|
|
|
<Employee name="Kandy Konrad" years="8.35"/> | |
|
| |||
|
|
It's important to note also that the order by
and return
clauses aren't evaluated when the where
clause is false (avoiding, for example, potential error cases).
You might also like to generate a nice XHTML table containing the results, with two employees per line. In this case, you want to group the results by position but after they have been sorted as above. This can be accomplished by applying another FLWOR query to the original one, like in Listing 6.41.
Example 6.41. Generating a table of the results, grouped into pairs
<table> <tr> <th>Employee</th><th>years</th> <th>Employee</th><th>years</th> </tr> { let $emps := (for $i in doc("team.xml")//Employee let $years := avg($i/descendant-or-self::Employee/@years) where exists($i/Employee) order by $years descending return <Employee name="{$i/Name}" years="{round-half-to-even($years, 2)}"/>) for $e at $pos in $emps let $next := $emps[$pos + 1] where $pos mod 2 = 0 return <tr> <td>{string($e/@Name)}</td><td>{string($e/@years)}</td> <td>{string($next/@Name)}</td><td>{string($next/@years)}</td> </tr> }</table>
The XQuery FLWOR expression introduces variables using let
, iterates over sequences using for
, filters and joins sequences using where
, sorts the remaining items using order by
, and constructs a result using return
. FLWOR is the central expression of most queries, and the most powerful.
The some
and every
expressions are simplified FLWOR statements that test whether some or every member of a sequence (or sequences) satisfy some condition. They can be convenient when trying to write easy-to-read queries.
Every XQuery developer should own a copy of An Introduction to Database Systems by C. J. Date, currently in its 8th edition. Although focused on the relational data model, this classic contains much useful information on joins, grouping, and data in general.
The FLWOR expression comes directly from Quilt, which was a precursor to XQuery. For a historical perspective on Quilt, check out the Web site at http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html.
18.191.135.224