In addition to simply regrouping items, it is often desirable to perform calculations on the groups. For example, suppose you want to know the number of item
elements in a department, or the sum of the quantities for a department. This type of aggregation can be performed using the aggregation functions. Example 7-10 shows some of these functions in action.
Example 7-10. Aggregation
Query for $d in distinct-values(doc("order.xml")//item/@dept) let $items := doc("order.xml")//item[@dept = $d] order by $d return <department code="{$d}" numItems="{count($items)}" distinctItemNums="{count(distinct-values($items/@num))}" totQuant="{sum($items/@quantity)}"/> Results <department code="ACC" numItems="2" distinctItemNums="2" totQuant="3"/> <department code="MEN" numItems="2" distinctItemNums="1" totQuant="2"/> <department code="WMN" numItems="2" distinctItemNums="1" totQuant="2"/>
Here is how the aggregation functions work:
count
This function is used to determine the number of items in the sequence. In Example 7-10, the count
function is used to calculate the value of numItems
, which is the number of items in the department. It is also used to calculate the value of distinctItemNums
. In the latter case, the count
function is combined with the distinct-values
function to count only the unique numbers in that department.
sum
This function is used to determine the total value of the items in a sequence. In Example 7-10, the sum
function is used to calculate the value of totQuant
, the sum of all the quantity attributes for that department.
min
and max
These functions are used to determine the minimum and maximum values of the items in the sequence.
avg
This function is used to determine the average value of the items in a sequence.
The sum
and avg
functions accept values that are all numeric, all xs:yearMonthDuration
values, or all xs:dayTimeDuration
values. The max
and min
functions accept values of any type that is ordered (i.e., values can be compared using <
and >
). This includes strings, dates, and many other types.
The sum
, min
, max
, and avg
functions treat untyped data as numeric. This means that if you are not using a schema, and you want to find a maximum string value, you need to use an expression like:
max(doc("order.xml")//item/string(
@dept)
)
which uses the string
function to convert each value to xs:string
before the comparison.
The sequence passed to an aggregation function may contain nodes that are zero-length strings, even though the user might think of them as "missing" values. For example, the minimum value of the color
attribute in order.xml
is black
. However, if there had been an item with a color
attribute whose value was a zero-length string (as in color=""
), the min
function would have returned a zero-length string.
On the other hand, there may be cases where you want "missing" values to be taken into consideration, but they are not. For example, the avg
function ignores any absent nodes. If you want the average product discount, and you use:
avg(doc("prices.xml")//discount)
you get the average of the two discount values. It does not take into account the fact that there are three products, and that you might want the discount to be counted as zero for the product with no discount
child. To count absent discount
children as zero, you need to calculate the average explicitly, using:
sum(doc("prices.xml")//prod/discount) div count(doc("prices.xml")//prod)
On the other hand, if a prod
had an empty discount
child (i.e., <discount></discount>
), it would be considered a zero-length string and the avg
function would raise an error because this value is not of a numeric or duration type. In that case, you would have to test for missing values using:
avg(doc("prices.xml")//prod/discount[. != ""
])
So far, the aggregation examples assume that you want to group on one value, the dept
attribute. Suppose you want to group on two values: the dept
attribute and the num
attribute. You would like to know the number of items and total quantity for each department/product number combination. This could be accomplished using the query shown in Example 7-11.
Example 7-11. Aggregation on multiple values
Query let $allItems := doc("order.xml")//item for $d in distinct-values($allItems/@dept) for $n in distinct-values($allItems[@dept = $d]/@num) let $items := $allItems[@dept = $d and @num = $n] order by $d, $n return <group dept="{$d}" num="{$n}" numItems="{count($items)}" totQuant="{sum($items/@quantity)}"/> Results <group dept="ACC" num="443" numItems="1" totQuant="2"/> <group dept="ACC" num="563" numItems="1" totQuant="1"/> <group dept="MEN" num="784" numItems="2" totQuant="2"/> <group dept="WMN" num="557" numItems="2" totQuant="2"/>
The query uses two for
clauses to obtain two separate lists of distinct departments and distinct numbers. Using two for
clauses results in the rest of the FLWOR being evaluated once for every possible combination of department and product number. The second for
clause uses a predicate to choose only numbers that exist in that particular department.
The let
clause binds $items
to a list of items that exist for that department/number combination. If any items exist in $items
, the return
clause returns a group
element with the summarized information for that combination.
In addition to returning aggregated values in the query results, you can constrain and sort the results on the aggregated values. Suppose you want to return the similar results to those shown in Example 7-11, but you only want the groups whose total quantity (totQuant
) is greater than 1, and you want the results sorted by the number of items (numItems
). The query shown in Example 7-12 accomplishes this.
Example 7-12. Constraining and sorting on aggregated values
Query let $allItems := doc("order.xml")//item for $d in distinct-values($allItems/@dept) for $n in distinct-values($allItems/@num) let $items := $allItems[@dept = $d and @num = $n] where sum($items/@quantity) > 1 order by count($items) return if (exists($items)) then <group dept="{$d}" num="{$n}" numItems="{count($items)}" totQuant="{sum($items/@quantity)}"/> else ( ) Results <group dept="ACC" num="443" numItems="1" totQuant="2"/> <group dept="WMN" num="557" numItems="2" totQuant="2"/> <group dept="MEN" num="784" numItems="2" totQuant="2"/>
Adjusting the query was a simple matter of adding a where
clause that tested the total quantity, and modifying the order by
clause to use the number of items.
18.117.142.144