Chapter 6. Selecting and Joining Using FLWORs

This chapter describes the facilities in XQuery for selecting, filtering, and joining data from one or more input documents. It covers the syntax of FLWOR expressions and quantified expressions.

Selecting with Path Expressions

Chapter 4 described how to use path expressions to select elements from input documents. For example, the expression:

doc("catalog.xml")//product[@dept = "ACC"]/name

can be used to select the names of all the products in the ACC department. You can add multiple predicates (expressions in square brackets) to filter the results based on more than one criterion. You can even add logical and other expressions to predicates, as in:

doc("catalog.xml")//product[@dept = "ACC" or @dept = "WMN"]/name

A path expression can be the entire content of a query; there is no requirement that there be a FLWOR expression in every query. Path expressions are useful for queries where no new elements and attributes are being constructed and the results don’t need to be sorted. A path expression can be preferable to a FLWOR because it is more compact and some implementations will be able to evaluate it faster.

FLWOR Expressions

FLWOR expressions, also known simply as FLWORs, are used for queries that are more complex. In addition to allowing more readable and structured selections, they allow functionality such as joining data from multiple sources, constructing new elements and attributes, evaluating functions on intermediate values, and sorting results.

FLWOR (pronounced “flower”), stands for “for, let, where, order by, return,” the keywords that are used in the expression. Example 6-1 shows a FLWOR that is equivalent to the second path expression from the previous section.

Example 6-1. FLWOR
for $prod in doc("catalog.xml")//product
let $prodDept := $prod/@dept
where $prodDept = "ACC" or $prodDept = "WMN"
return $prod/name

Of course, this is far more verbose, and for such a simple example, the path expression is preferable. However, this example is useful as an illustration before moving on to examples that are more complex. As you can see, the FLWOR is made up of several parts:

for

This clause sets up an iteration through the product elements returned by the path expression. The variable $prod is bound, in turn, to each product in the sequence. The rest of the FLWOR is evaluated once for each product, in this case, four times.

let

This clause binds the $prodDept variable to the value of the dept attribute.

where

This clause selects elements whose dept attribute is equal to ACC or WMN.

return

This clause returns the name child of each of the three product elements that pass the where clause.

The overall syntax of a FLWOR is shown in Figure 6-1.

Figure 6-1. Syntax of a FLWOR

The first clause in a FLWOR must be a for, let, or window clause. After that, any number of any of the clauses listed may appear, in any order. The final clause is the required return clause. The clauses of a FLWOR are listed in Table 6-1, along with a link to where they are covered fully in the book.

Table 6-1. FLWOR clauses
KeywordDescriptionChapter/Section
forSets up an iteration “The for Clause”
letBinds a variable “The let Clause”
for tumbling window, for sliding windowCreates windows “Windowing”
whereFilters out elements “The where Clause”
group byGroups results “Grouping Using the group by Clause”
order bySorts the results “The order by Clause”
countBinds the position to a variable “Using the count Clause”
returnSpecifies what to return “The return Clause”

FLWORs can be the whole query, or they can appear in other expressions such as in the return clause of another FLWOR or even in a function call, as in:

max(for $prod in doc("catalog.xml")//product
    return xs:integer($prod/number))

The for and return keywords are aligned vertically here to make the structure of the FLWOR more obvious. This is generally good practice, although not always possible.

Let’s take a closer look at the main clauses that make up the FLWOR.

The for Clause

A for clause, whose syntax is shown in Figure 6-2, sets up an iteration that allows the rest of the FLWOR to be evaluated multiple times, once for each item in the sequence returned by the expression after the in keyword. This sequence, also known as the binding sequence, can evaluate to any sequence of zero, one, or more items. In the previous example, it was a sequence of product elements, but it could also be atomic values, attribute nodes, or indeed items of any kind, or a mixture of items. If the binding sequence is the empty sequence, the rest of the FLWOR is simply not evaluated (it iterates zero times).

Additional features of the for clause are described elsewhere in the book. The at clause, which allows for positional variables, is described in “Working with Positions and Sequence Numbers”. The as clause, which declares the type of the variable, is described in “Type Declarations in FLWORs”. The allowing empty keywords, which allow outer joins, are described in “Outer joins with allowing empty.

Figure 6-2. Syntax of a for clause

The FLWOR expression with its for clause is similar to loops in procedural languages such as C. However, one key difference is that in XQuery, because it is a functional language, the iterations are considered to be in no particular order. They do not necessarily occur sequentially, one after the other. One manifestation of this is that you cannot keep variable counters that are incremented with each iteration, or continuously append to the end of a string variable with each iteration. “Working with Positions and Sequence Numbers” provides more information about simulating counters.

Range expressions

Another useful technique is to supply a sequence of integers in the for clause in order to specify the number of times to iterate. This can be accomplished through a range expression, which creates a sequence of consecutive integers. For example, the range expression 1 to 3 evaluates to a sequence of integers (1, 2, 3). The FLWOR shown in Example 6-2 iterates three times and returns three oneEval elements.

Example 6-2. Using a range expression

Query

for $i in 1 to 3
return <oneEval>{$i}</oneEval>

Results

<oneEval>1</oneEval>
<oneEval>2</oneEval>
<oneEval>3</oneEval>

Range expressions can be included within parenthesized expressions, as in (1 to 3, 6, 8 to 10). They can also use variables, as in 1 to $prodCount. Each of the expressions before and after the to keyword must evaluate to an integer.

If the first integer is greater than the second, as in 3 to 1, or if either operand is the empty sequence, the expression evaluates to the empty sequence. The reason for this is to ensure that for $i in 1 to count($seq) does the expected thing even if $seq is an empty sequence.

You can use the reverse function if you want to descend in value, as in:

for $i in reverse(1 to 3)

You can also increment by some value other than 1 by using an expression like:

for $i in (1 to 100)[. mod 2 = 0]

which gives you every other number (2, 4, 6, etc.) up to 100.

Multiple for clauses

You can use multiple for clauses in a FLWOR, which is similar to nested loops in a programming language. The result is that the rest of the FLWOR is evaluated for every combination of the values of the variables. Example 6-3 shows a query with two for clauses, and demonstrates the order of the results.

Example 6-3. Multiple for clauses

Query

for $i in (1, 2)
for $j in ("a", "b")
return <oneEval>i is {$i} and j is {$j}</oneEval>

Results

<oneEval>i is 1 and j is a</oneEval>
<oneEval>i is 1 and j is b</oneEval>
<oneEval>i is 2 and j is a</oneEval>
<oneEval>i is 2 and j is b</oneEval>

The order is significant; it uses the first value of the first variable ($i) and iterates over the values of the second variable ($j), then takes the second value of $i and iterates over the values of $j.

Also, multiple variables can be bound in a single for clause, separated by commas. This has the same effect as using multiple for clauses. The example shown in Example 6-4 returns the same results as Example 6-3. This syntax is shorter but can be less clear in the case of complex expressions.

Example 6-4. Multiple variable bindings in one for clause
for $i in (1, 2), $j in ("a", "b")
return <oneEval>i is {$i} and j is {$j}</oneEval>

Specifying multiple variable bindings (or multiple for clauses) is especially useful for joining data. This is described further in “Joins”.

The let Clause

A let clause is a convenient way to bind a variable to a value. Unlike a for clause, a let clause does not result in iteration; it binds the whole sequence to the variable rather than binding each item in turn. The let clause serves as a programmatic convenience that avoids repeating the same expression multiple times. With some implementations, it may improve performance, because the expression is evaluated only once instead of each time it is needed.

The syntax of a let clause is shown in Figure 6-3. (The as clause, which declares the type of the variable, is described in “Type Declarations in FLWORs”.)

Figure 6-3. Syntax of a let clause

To illustrate the difference between for and let clauses, compare Example 6-5 with Example 6-2.

Example 6-5. Using a let clause with a range expression

Query

let $i := (1 to 3)
return <oneEval>{$i}</oneEval>

Results

<oneEval>1 2 3</oneEval>

The FLWOR with the let clause returns only a single oneEval element, because no iteration takes place and the return clause is evaluated only once.

One or more let clauses can be intermingled with one or more for clauses. Each of the let and for clauses may reference a variable bound in any previous clause. Example 6-6 shows such a FLWOR.

Example 6-6. Intermingled for and let clauses
let $doc := doc("catalog.xml")
for $prod in $doc//product
let $prodDept := $prod/@dept
let $prodName := $prod/name
where $prodDept = "ACC" or $prodDept = "WMN"
return $prodName

As with for clauses, adjacent let clauses can be represented using a slightly shortened syntax that replaces the let keyword with a comma, as in:

let $prodDept := $prod/@dept, $prodName := $prod/name

Another handy use for the let clause is to perform several functions or operations in order. For example, suppose you want to take a string and replace all instances of at with @, replace all instances of dot with a period (.), and remove any remaining spaces. You could write the expression:

replace(replace(replace($myString, 'at', '@'), 'dot', '.'), ' ', '')

but that is difficult to read and debug, especially as more functions are added. An alternative is the expression:

let $myString2 := replace($myString, 'at', '@')
let $myString3 := replace($myString2, 'dot', '.')
let $myString4 := replace($myString3, ' ', '')
return $myString4

which makes the query clearer.

The where Clause

The where clause is used to specify criteria that filter the results of the FLWOR. Its syntax is shown in Figure 6-4.

Figure 6-4. Syntax of a where clause

The where clause can reference variables that were bound by a for or let clause. For example:

where $prodDept = "ACC" or $prodDept = "WMN"

references the $prodDept variable. In addition to expressing complex filters, the where clause is also very useful for joins.

A where clause can be composed of many expressions joined by and and or keywords, as shown in Example 6-7.

Example 6-7. A where clause with multiple expressions
for $prod in doc("catalog.xml")//product
let $prodDept := $prod/@dept
where $prod/number > 100
      and starts-with($prod/name, "F")
      and exists($prod/colorChoices)
      and ($prodDept = "ACC" or $prodDept = "WMN")
return $prod

Starting in version 3.0, it is also possible to have multiple where clauses in the same FLWOR. In previous versions, only one was allowed.

Note that when using paths within the where clause, they need to start with an expression that sets the context. For example, it has to say $prod/number > 100 rather than just number > 100. Otherwise, the processor does not know where to look for the number child.

The effective boolean value of the where clause is calculated. This means that if the where clause evaluates to a Boolean value false, a zero-length string, the number 0 or NaN, or the empty sequence, it is considered false, and the return clause of the FLWOR is not evaluated for that iteration. If the effective boolean value is true, the return clause is evaluated. For example, you could use:

where $prod/name

which returns true if $prod has a name child, and false if it does not. As another example, you could use:

where $numProds

which returns true if $numProds is a numeric value that is not zero (and not NaN). However, these types of expressions are somewhat cryptic, and it is preferable to use clearer expressions, such as:

where exists($prod/name)
and $numProds > 0

The return Clause

The return clause consists of the return keyword followed by the single expression that is to be returned. It is evaluated once for each iteration, assuming the where clause evaluated to true. The result value of the entire FLWOR is a sequence of items returned by each evaluation of the return clause. For example, the value of the entire FLWOR:

for $i in (1 to 3)
return <oneEval>{$i}</oneEval>

is a sequence of three oneEval elements, one for each time the return clause was evaluated.

If more than one expression is to be included in the return clause, they can be combined in a sequence. For example, the FLWOR:

for $i in (1 to 3)
return (<one>{$i}</one>, <two>{$i}</two>)

returns a sequence of six elements, two for each time the return clause is evaluated. The parentheses and comma are used in the return clause to indicate that a sequence of the two elements should be returned. If no parentheses or comma were used, the two element constructor would not be considered part of the FLWOR.

The Scope of Variables

When a variable is bound in a for or let clause, it can be referenced anywhere in that FLWOR after the clause that binds it. This includes, for example, other subsequent let or for clauses, the where clause, or the return clause. It cannot be referenced in a preceding clause, and it cannot be referenced in the clause itself, as in:

let $count := 0
for $prod in doc("catalog.xml")//product
let $count := $count + 1

This does not raise an error, but it is actually declaring a new variable with the same name, so it will have unexpected results, as described in “Adding Sequence Numbers to Results”.

If you bind two variables with the same name with the same containing expression, such as two for or let clauses that are part of the same FLWOR, you may again get unexpected results. It will create two separate variables with the same name, where the second masks the first and makes it inaccessible within the scope of the second variable.

Quantified Expressions

A quantified expression determines whether some or all of the items in a sequence meet a particular condition. For example, if you want to know whether any of the items in an order are from the accessory department, you can use the expression shown in Example 6-8. This expression will return true.

Example 6-8. Quantified expression using the some keyword
some $dept in doc("catalog.xml")//product/@dept
satisfies ($dept = "ACC")

Alternatively, if you want to know if every item in an order is from the accessory department, you can simply change the word some to every, as shown in Example 6-9. This expression will return false.

Example 6-9. Quantified expression using the every keyword
every $dept in doc("catalog.xml")//product/@dept
satisfies ($dept = "ACC")

A quantified expression always evaluates to a Boolean value (true or false). As such, it is not useful for selecting the elements or attributes that meet certain criteria, but rather for simply determining whether any exist. Quantified expressions can generally be easily rewritten as FLWORs or even as simple path expressions. However, the quantified expression can be more compact and easier for implementations to optimize.

A quantified expression is made of several parts:

  • A quantifier (the keyword some or every)

  • One or more in clauses that bind variables to sequences

  • A satisfies clause that contains the test expression

The syntax of a quantified expression is shown in Figure 6-5. (The as clause, which specifies the type of the variable, is described in “Type Declarations in Quantified Expressions”.)

Figure 6-5. Syntax of a quantified expression

The processor tests the satisfies expression (using its effective boolean value) for every item in the sequence. If the quantifier is some, it returns true if the satisfies expression is true for any of the items. If the quantifier is every, it returns true only if the satisfies expression is true for all items. If there are no items in the sequence, an expression with some always returns false, while an expression with every always returns true.

You can use the not function with a quantified expression to express “not any” (none), and “not every.” Example 6-10 returns true if none of the product elements have a dept attribute equal to ACC. For our particular catalog, this returns false.

Example 6-10. Combining the not function with a quantified expression
not(some $dept in doc("catalog.xml")//product/@dept
      satisfies ($dept = "ACC"))

Binding Multiple Variables

You can bind multiple variables in a quantified expression by separating the clauses with commas. As with the for clauses of FLWORs, the result is that every combination of the items in the sequences is taken. Example 6-11 returns true because there is a combination of values (where $i is 3 and $j is 10) where the satisfies expression is true.

Example 6-11. Binding multiple variables in a quantified expression
some $i in (1 to 3), $j in (10, 11)
  satisfies $j - $i = 7

Selecting Distinct Values

The distinct-values function selects distinct atomic values from a sequence. For example, the function call:

distinct-values(doc("catalog.xml")//product/@dept)

returns all the distinct values of the dept attribute, namely ("WMN", "ACC", "MEN"). This function determines whether two values are distinct based on their value equality by using the eq operator.

It is also common to select a distinct set of combinations of values. For example, you might want to select all the distinct department/product number combinations from the product catalog. You cannot use the distinct-values function directly for this, because it accepts only one sequence of atomic values, not multiple sequences of multiple values. Instead, you could use the expression shown in Example 6-12.

Example 6-12. Distinctness on a combination of values

Query

let $prods := doc("catalog.xml")//product
for $d in distinct-values($prods/@dept),
    $n in distinct-values($prods[@dept = $d]/number)
return <result dept="{$d}" number="{$n}"/>

Results

<result dept="WMN" number="557"/>
<result dept="ACC" number="563"/>
<result dept="ACC" number="443"/>
<result dept="MEN" number="784"/>

For each distinct department, bound to $d, it generates a list of distinct product numbers within that department by using the predicate [@dept = $d]. It then returns the resulting combination of values as a result element. The order in which the values are returned is implementation-dependent, so it can be unpredictable.

Additional data items can be added by adding for clauses with the appropriate predicates.

Joins

One of the major benefits of FLWORs is that they can easily join data from multiple sources. For example, suppose you want to join information from your product catalog (catalog.xml) and your order (order.xml). You want a list of all the items in the order, along with their number, name, and quantity. Example 6-13 shows a FLWOR that performs this join.

Example 6-13. Two-way join in a predicate

Query

for $item in doc("order.xml")//item,
    $prod in doc("catalog.xml")//product[number = $item/@num]
return <item num="{$item/@num}"
             name="{$prod/name}"
             quan="{$item/@quantity}"/>

Results

<item num="557" name="Fleece Pullover" quan="1"/>
<item num="563" name="Floppy Sun Hat" quan="1"/>
<item num="443" name="Deluxe Travel Bag" quan="2"/>
<item num="784" name="Cotton Dress Shirt" quan="1"/>
<item num="784" name="Cotton Dress Shirt" quan="1"/>
<item num="557" name="Fleece Pullover" quan="1"/>

The first part of the for clause selects each item from the order, and the second part selects the matching product element from the catalog.xml document, using a predicate to identify the one whose number matches the item’s num attribute. Another way to accomplish the same thing is by using a where clause instead of a predicate, as shown in Example 6-14. This query yields the same results.

Example 6-14. Two-way join in a where clause
for $item in doc("order.xml")//item,
    $prod in doc("catalog.xml")//product
where $item/@num = $prod/number
return <item num="{$item/@num}"
             name="{$prod/name}"
             quan="{$item/@quantity}"/>

Whether to use a predicate or a where clause is a matter of personal preference. When many conditions apply, a where clause can be more readable. However, for simple conditions, a predicate may be preferable because it is less verbose. In some implementations, predicates perform faster than where clauses.

Three-Way Joins

Joins can be extended to allow more than two sources to be joined together. For example, suppose that, along with catalog.xml and order.xml, you also want to join the prices.xml document, which contains current pricing information for each product.

The query shown in Example 6-15 joins the prices.xml document with the others to provide pricing information in the results. It uses two expressions in the where clause to implement the two joins.

Example 6-15. Three-way join in a where clause

Query

for $item in doc("order.xml")//item,
    $prod in doc("catalog.xml")//product,
    $price in doc("prices.xml")//prod
where $item/@num = $prod/number and $prod/number = $price/@num
return <item num="{$item/@num}"
             name="{$prod/name}"
             price="{$price/price}"/>

Results

<item num="557" name="Fleece Pullover" price="29.99"/>
<item num="563" name="Floppy Sun Hat" price="69.99"/>
<item num="443" name="Deluxe Travel Bag" price="39.99"/>
<item num="557" name="Fleece Pullover" price="29.99"/>

Outer Joins

The previous join examples in this section are known as inner joins; the results do not include items without matching products or products without matching items. Suppose you want to create a list of products and join it with the price information. Even if there is no price, you still want to include the product in the list. This is known in relational databases as an outer join.

The query in Example 6-16 performs an outer join. It uses two FLWORs, one embedded in the return clause of the other. The outer FLWOR returns the list of products, regardless of the availability of price information. The inner FLWOR selects the price, if it is available.

Example 6-16. Outer join

Query

for $prod in doc("catalog.xml")//product
return <product number="{$prod/number}">{
                attribute price
                   {for $price in doc("prices.xml")//prod
                    where $prod/number = $price/@num
                    return $price/price}
  }</product>

Results

<product number="557" price="29.99"/>
<product number="563" price="69.99"/>
<product number="443" price="39.99"/>
<product number="784" price=""/>

Product 784 doesn’t have a corresponding price in the prices.xml document, so the price attribute has an empty value for that product.

Outer joins with allowing empty

Version 3.0 added a new feature to the for clause that makes outer joins simpler and more explicit. Adding the keywords allowing empty into the for clause means that an outer join should be performed that allows that value to be absent. Example 6-17 shows an example equivalent to Example 6-16 (with the same results) that takes advantage of this new syntax.

Example 6-17. Outer join with allowing empty
xquery version "3.0";
for $prod in doc("catalog.xml")//product
for $price allowing empty
    in doc("prices.xml")//prices/priceList/prod[@num = $prod/number]
return <product number="{$prod/number}" price="{$price/price}"/>

In this example, because allowing empty is specified in the for clause for $price, the return clause is evaluated once in the case where $price is the empty sequence. Without allowing empty, the return clause would have been evaluated zero times for that particular $prod.

Joins and Types

The where clauses in the join examples use the = operator to determine whether two values are equal. Keep in mind that XQuery considers type when determining whether two values are equal. If schemas are not used with these documents, both values are untyped, and the join shown in Example 6-16 compares the values as strings. Unless they are cast to numeric types, the join does not consider different representations of the same number equal, for example 0557 and 557.

On the other hand, if number in catalog.xml is declared as an xs:integer, and the num attribute in prices.xml is declared as an xs:string, the join will not work. One value would have to be explicitly cast to the other’s type, as in:

where $prod/number = xs:integer($price/@num)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.161.132