XQuery allows you to create your own functions. This allows query fragments to be reused, and allows code libraries to be developed and reused by other parties. User-defined functions can also make a query more readable by separating out expressions and naming them. For a starter set of user-defined function examples, see http://www.xqueryfunctions.com.
There are many good reasons for user-defined functions, such as:
If you are evaluating the same expression repeatedly, it makes sense to define it as a separate function, and then call it from multiple places. This has the advantage of being written (and maintained) only once. If you want to change the algorithm later—for example, to accept the empty sequence or to fix a bug—you can do it only in one place.
Functions make it clearer to the query reader what is going on. Having a function clearly named, with a set of named, typed parameters, serves as a form of documentation. It also physically separates it from the rest of the query, which makes it easier to decipher complex queries with many nested expressions.
It is virtually impossible to implement some algorithms without recursion. For example, if you want to generate a table of contents based on section headers, you can write a recursive function that processes section elements, their children, their grandchildren, and so on.
By encapsulating functionality such as "get all the orders for a product" into a user-defined function, applications become easier to adapt to subsequent schema changes.
The function conversion rules automatically perform some type promotions, casting, and atomization. These type conversions can be performed explicitly in the query, but sometimes it is cleaner simply to call a function.
Functions are defined using function declarations,
which can appear either in the query prolog or in an external library. Example 8-1 shows a function declaration in a query prolog. The function, called local:discountPrice
, accepts three arguments: a price, a discount, and a maximum discount percent. It applies the lesser of the discount and the maximum discount to the price. The last line in the example is the query body, which consists of a call to the discountPrice
function.
Example 8-1. A function declaration
declare function local:discountPrice( $price as xs:decimal?, $discount as xs:decimal?, $maxDiscountPct as xs:integer?) as xs:decimal? { let $maxDiscount := ($price * $maxDiscountPct) div 100 let $actualDiscount := min(($maxDiscount, $discount)) return ($price - $actualDiscount) }; local:discountPrice($prod/price, $prod/discount, 15)
As you can see, a function declaration consists of several parts:
A previous draft of the XQuery recommendation used the keywords define function
instead of declare function
. Some popular XQuery implementations still use the previous syntax.
The syntax of a function declaration is shown in Figure 8-2. The parameter list is optional, although the parentheses around the parameter list are required. The return type is also optional, but it is strongly encouraged.
The function body is an expression enclosed in curly braces, which may contain any valid XQuery expressions, including FLWORs, path expressions, or any other XQuery expression. It does not have to contain a return
clause; the return value is simply the value of the expression. You could have a function declaration as minimal as:
declare function local:get-pi( ) {3.141592653589};
Within a function body, a function can call other functions that are declared anywhere in the module, or in an imported module, regardless of the order of their declarations.
Once the function body has been evaluated, its value is converted to the return type using the function conversion rules described in "Function Conversion Rules" in Chapter 11. If the return type is not specified, it is assumed to be item*
, that is, a possibly empty sequence of atomic values and nodes.
Each function is uniquely identified by its qualified name and its number of parameters. There can be more than one function declaration that has the same qualified name, as long as the number of parameters is different. The function name must be a valid XML name, meaning that it can start with a letter or underscore and contain letters, digits, underscores, dashes, and periods. Like other XML names, function names are case-sensitive.
All user-defined function names must be in a namespace. In the main query module, you can use any prefix that is declared in the prolog. You can also use the predefined prefix local
, which puts the function in the namespace http://www.w3.org/2005/xquery-local-functions. It can then be called from within that main module using the prefix local
. On the other hand, if a function is declared in a library module, its name must be in the target namespace of the module. Library modules are discussed in "Assembling Queries from Multiple Modules" in Chapter 12.
In addition, certain function names are reserved; these are listed in Table 8-1. It is not illegal to declare functions with these names, but when called they must be prefixed. As long as you have not overridden the default function namespace, this is not an issue. However, for clarity, it is best to avoid these function names.
The syntax of a parameter list is shown in Figure 8-3. Each parameter has a unique name, and optionally a type. The name is expressed as a variable name, preceded by a dollar sign ($). When a function is called, the variable specified is bound to the value that is passed to it. For example, the function declaration:
declare function local:getProdNum ($prod as element()) as element( ) { $prod/number };
binds the $prod
variable to the value of the argument passed to it. The $prod
variable can be referenced anywhere in the function body.
The type is expressed as a sequence type, described earlier in this chapter. If no type is specified for a particular parameter, it allows any argument. However, it is best to specify a type, for the purposes of error checking and clarity. Some of the built-in functions use the keyword numeric
to indicate that the argument may be of any numeric type. This keyword cannot be used in user-defined functions.
When the function is called, each argument value is converted to the appropriate type according to the function conversion rules.
You may be faced with the decision of whether to accept a node that contains an atomic value, or to accept the atomic value itself. For example, in the declaration of local:discountPrice
, you could have accepted the price
and discountPct
element instead of accepting their xs:decimal
and xs:integer
values. There are some cases where it is advantageous to pass the entire element as an argument, such as if:
You want to access its attributes—for example, to access the currency
attribute of price
You need to access its parent or siblings
However, if you are interested in only a single data value, there are a number of reasons why it is generally better to accept the atomic value:
It is more flexible, in that you can pass a node to a function that expects an atomic value, but you cannot pass an atomic value to a function that expects a node.
You can be more specific about the desired type of the value, to ensure, for example, that it is an xs:integer
.
You don't have to cast untyped values to the desired type; this will happen automatically as part of the conversion.
You may have noticed that many of the XQuery built-in functions accept the empty sequence as arguments, as evidenced by the occurrence indicators * and ?. For example, the substring
function accepts the empty sequence for its first argument and returns the empty sequence if the empty sequence is passed to it. This is a flexible way of handling optional elements. If you want to take a substring
of an optional number
child, if it exists, you can simply specify:
substring ($product/number, 1, 5)
If the substring
function were less flexible, and did not accept the empty sequence, you would be required to write:
if ($product/number) then substring ($product/number, 1, 5) else ( )
This can become quite cumbersome if you are nesting many function calls. Generally, your functions should be designed to be easily nested in this way as well.
It is also important to decide how you want the function to handle arguments that are the empty sequence, if they are allowed. In some cases, it is not appropriate simply to return the empty sequence. Using the local:discountPrice
function from Example 8-1, suppose $discount
is bound to the empty sequence, because $prod
has no discount
child. The function returns the empty sequence because all arithmetic operations on the empty sequence return the empty sequence.
It is more likely that you want the function to return the original price if no discount amount is provided. Example 8-2 shows a revised function declaration where special checking is done for the case where either $discount
or $maxDiscountPct
is the empty sequence.
Example 8-2. Handling the empty sequence
declare function local:discountPrice( $price as xs:decimal?, $discount as xs:decimal?, $maxDiscountPct as xs:integer?) as xs:double? { let $newDiscount := if ($discount) then $discount else 0 let $maxDiscount := if ($maxDiscountPct) then ($price * $maxDiscountPct) div 100 else 0 let $actualDiscount := min(($maxDiscount, $discount)) return ($price - $actualDiscount) }; local:discountPrice($prod/price, $prod/discount, 15)
Inside a function body, there is no context item, even if there is one in the part of the query that contained the function call. For example, the function shown in Example 8-3 is designed to return all the products with numbers whose second digit is greater than 5. You might think that because the function is called in an expression where the context is the product
element, the function can use the simple expression number
to access the number
child of that product
. However, since the function does not inherit the context item from the main body of the query, the processor does not have a context in which to evaluate number
.
Example 8-3. Invalid use of context in a function body
declare function local:prod2ndDigit( ) as xs:string? {
substring(number
, 2, 1)
};
doc("catalog.xml")//product[local:prod2ndDigit( ) > '5']
Instead, the relevant node must be passed to the function as an argument. Example 8-4 shows a revised function that correctly accepts the desired product
element as an argument and uses a path expression ($prod/number
) to find the number
child. The product
element is passed to the function using a period (.), shorthand for the context item.
Functions can recursively call themselves. For example, suppose you want to count the number of descendant elements of an element (not just the immediate children, but all the descendants). You could accomplish this using the function shown in Example 8-5.
The functx:num-descendant-elements
function recursively calls itself to determine how many element children the element has, how many children its children has, and so on. The only caveat is that there must be a level at which the function stops calling itself. In this case, it will eventually reach an element that has no children, so the return
clause will not be evaluated. On the other hand, declaring a function such as:
declare function local:addItUp () { 1 + local:addItUp( ) };
results in an infinite loop, which will possibly end with an "out of memory" or "stack full" error.
You can also declare mutually recursive functions that call each other. Chapter 9 explores the use of recursive functions for making modifications to element structures.
3.135.247.68