Aggregating Values

In addition to simply regrouping items, it is often desirable to perform calculations on the groups. For example, suppose you want to know the number of item elements in a department, or the sum of the quantities for a department. This type of aggregation can be performed using the aggregation functions. Example 7-10 shows some of these functions in action.

Example 7-10. Aggregation

Query
for $d in distinct-values(doc("order.xml")//item/@dept)
let $items := doc("order.xml")//item[@dept = $d]
order by $d
return <department code="{$d}"
                   numItems="{count($items)}"
                   distinctItemNums="{count(distinct-values($items/@num))}"
                   totQuant="{sum($items/@quantity)}"/>
Results
<department code="ACC" numItems="2" distinctItemNums="2" totQuant="3"/>
<department code="MEN" numItems="2" distinctItemNums="1" totQuant="2"/>
<department code="WMN" numItems="2" distinctItemNums="1" totQuant="2"/>

Here is how the aggregation functions work:

count

This function is used to determine the number of items in the sequence. In Example 7-10, the count function is used to calculate the value of numItems, which is the number of items in the department. It is also used to calculate the value of distinctItemNums. In the latter case, the count function is combined with the distinct-values function to count only the unique numbers in that department.

sum

This function is used to determine the total value of the items in a sequence. In Example 7-10, the sum function is used to calculate the value of totQuant, the sum of all the quantity attributes for that department.

min and max

These functions are used to determine the minimum and maximum values of the items in the sequence.

avg

This function is used to determine the average value of the items in a sequence.

The sum and avg functions accept values that are all numeric, all xs:yearMonthDuration values, or all xs:dayTimeDuration values. The max and min functions accept values of any type that is ordered (i.e., values can be compared using < and >). This includes strings, dates, and many other types.

The sum, min, max, and avg functions treat untyped data as numeric. This means that if you are not using a schema, and you want to find a maximum string value, you need to use an expression like:

max(doc("order.xml")//item/string(@dept))

which uses the string function to convert each value to xs:string before the comparison.

Ignoring "Missing" Values

The sequence passed to an aggregation function may contain nodes that are zero-length strings, even though the user might think of them as "missing" values. For example, the minimum value of the color attribute in order.xml is black. However, if there had been an item with a color attribute whose value was a zero-length string (as in color=""), the min function would have returned a zero-length string.

Counting "Missing" Values

On the other hand, there may be cases where you want "missing" values to be taken into consideration, but they are not. For example, the avg function ignores any absent nodes. If you want the average product discount, and you use:

avg(doc("prices.xml")//discount)

you get the average of the two discount values. It does not take into account the fact that there are three products, and that you might want the discount to be counted as zero for the product with no discount child. To count absent discount children as zero, you need to calculate the average explicitly, using:

sum(doc("prices.xml")//prod/discount)
       div count(doc("prices.xml")//prod)

On the other hand, if a prod had an empty discount child (i.e., <discount></discount>), it would be considered a zero-length string and the avg function would raise an error because this value is not of a numeric or duration type. In that case, you would have to test for missing values using:

avg(doc("prices.xml")//prod/discount[. != ""])

Aggregating on Multiple Values

So far, the aggregation examples assume that you want to group on one value, the dept attribute. Suppose you want to group on two values: the dept attribute and the num attribute. You would like to know the number of items and total quantity for each department/product number combination. This could be accomplished using the query shown in Example 7-11.

Example 7-11. Aggregation on multiple values

Query
let $allItems := doc("order.xml")//item
for $d in distinct-values($allItems/@dept)
for $n in distinct-values($allItems[@dept = $d]/@num)
let $items := $allItems[@dept = $d and @num = $n]
order by $d, $n
return <group dept="{$d}" num="{$n}"
              numItems="{count($items)}"
              totQuant="{sum($items/@quantity)}"/>
Results
<group dept="ACC" num="443" numItems="1" totQuant="2"/>
<group dept="ACC" num="563" numItems="1" totQuant="1"/>
<group dept="MEN" num="784" numItems="2" totQuant="2"/>
<group dept="WMN" num="557" numItems="2" totQuant="2"/>

The query uses two for clauses to obtain two separate lists of distinct departments and distinct numbers. Using two for clauses results in the rest of the FLWOR being evaluated once for every possible combination of department and product number. The second for clause uses a predicate to choose only numbers that exist in that particular department.

The let clause binds $items to a list of items that exist for that department/number combination. If any items exist in $items, the return clause returns a group element with the summarized information for that combination.

Constraining and Sorting on Aggregated Values

In addition to returning aggregated values in the query results, you can constrain and sort the results on the aggregated values. Suppose you want to return the similar results to those shown in Example 7-11, but you only want the groups whose total quantity (totQuant) is greater than 1, and you want the results sorted by the number of items (numItems). The query shown in Example 7-12 accomplishes this.

Example 7-12. Constraining and sorting on aggregated values

Query
let $allItems := doc("order.xml")//item
for $d in distinct-values($allItems/@dept)
for $n in distinct-values($allItems/@num)
let $items := $allItems[@dept = $d and @num = $n]
where sum($items/@quantity) > 1
order by count($items)
return if (exists($items))
       then <group dept="{$d}" num="{$n}" numItems="{count($items)}"
                   totQuant="{sum($items/@quantity)}"/>
       else ( )
Results
<group dept="ACC" num="443" numItems="1" totQuant="2"/>
<group dept="WMN" num="557" numItems="2" totQuant="2"/>
<group dept="MEN" num="784" numItems="2" totQuant="2"/>

Adjusting the query was a simple matter of adding a where clause that tested the total quantity, and modifying the order by clause to use the number of items.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.142.144