User-Defined Functions

XQuery allows you to create your own functions. This allows query fragments to be reused, and allows code libraries to be developed and reused by other parties. User-defined functions can also make a query more readable by separating out expressions and naming them. For a starter set of user-defined function examples, see http://www.xqueryfunctions.com.

Why Define Your Own Functions?

There are many good reasons for user-defined functions, such as:

Reuse

If you are evaluating the same expression repeatedly, it makes sense to define it as a separate function, and then call it from multiple places. This has the advantage of being written (and maintained) only once. If you want to change the algorithm later—for example, to accept the empty sequence or to fix a bug—you can do it only in one place.

Clarity

Functions make it clearer to the query reader what is going on. Having a function clearly named, with a set of named, typed parameters, serves as a form of documentation. It also physically separates it from the rest of the query, which makes it easier to decipher complex queries with many nested expressions.

Recursion

It is virtually impossible to implement some algorithms without recursion. For example, if you want to generate a table of contents based on section headers, you can write a recursive function that processes section elements, their children, their grandchildren, and so on.

Managing change

By encapsulating functionality such as "get all the orders for a product" into a user-defined function, applications become easier to adapt to subsequent schema changes.

Automatic type conversions

The function conversion rules automatically perform some type promotions, casting, and atomization. These type conversions can be performed explicitly in the query, but sometimes it is cleaner simply to call a function.

Function Declarations

Functions are defined using function declarations, which can appear either in the query prolog or in an external library. Example 8-1 shows a function declaration in a query prolog. The function, called local:discountPrice, accepts three arguments: a price, a discount, and a maximum discount percent. It applies the lesser of the discount and the maximum discount to the price. The last line in the example is the query body, which consists of a call to the discountPrice function.

Example 8-1. A function declaration

declare function local:discountPrice(
  $price as xs:decimal?,
  $discount as xs:decimal?,
  $maxDiscountPct as xs:integer?) as xs:decimal?

{
   let $maxDiscount := ($price * $maxDiscountPct) div 100
   let $actualDiscount := min(($maxDiscount, $discount))
   return ($price - $actualDiscount)
};

local:discountPrice($prod/price, $prod/discount, 15)

As you can see, a function declaration consists of several parts:

  • The keywords declare function followed by the qualified function name

  • A list of parameters enclosed in parentheses and separated by commas

  • The return type of the function

  • A function body enclosed in curly braces and followed by a semicolon

Important

A previous draft of the XQuery recommendation used the keywords define function instead of declare function. Some popular XQuery implementations still use the previous syntax.

The syntax of a function declaration is shown in Figure 8-2. The parameter list is optional, although the parentheses around the parameter list are required. The return type is also optional, but it is strongly encouraged.

Syntax of a function declarationThe syntax of < param-list > is shown in Figure 8-3.

Figure 8-2. Syntax of a function declaration[a]

The Function Body

The function body is an expression enclosed in curly braces, which may contain any valid XQuery expressions, including FLWORs, path expressions, or any other XQuery expression. It does not have to contain a return clause; the return value is simply the value of the expression. You could have a function declaration as minimal as:

declare function local:get-pi( ) {3.141592653589};

Within a function body, a function can call other functions that are declared anywhere in the module, or in an imported module, regardless of the order of their declarations.

Once the function body has been evaluated, its value is converted to the return type using the function conversion rules described in "Function Conversion Rules" in Chapter 11. If the return type is not specified, it is assumed to be item*, that is, a possibly empty sequence of atomic values and nodes.

The Function Name

Each function is uniquely identified by its qualified name and its number of parameters. There can be more than one function declaration that has the same qualified name, as long as the number of parameters is different. The function name must be a valid XML name, meaning that it can start with a letter or underscore and contain letters, digits, underscores, dashes, and periods. Like other XML names, function names are case-sensitive.

All user-defined function names must be in a namespace. In the main query module, you can use any prefix that is declared in the prolog. You can also use the predefined prefix local, which puts the function in the namespace http://www.w3.org/2005/xquery-local-functions. It can then be called from within that main module using the prefix local. On the other hand, if a function is declared in a library module, its name must be in the target namespace of the module. Library modules are discussed in "Assembling Queries from Multiple Modules" in Chapter 12.

In addition, certain function names are reserved; these are listed in Table 8-1. It is not illegal to declare functions with these names, but when called they must be prefixed. As long as you have not overridden the default function namespace, this is not an issue. However, for clarity, it is best to avoid these function names.

Table 8-1. Reserved function names

attribute

if

schema-attribute

comment

item

schema-element

document-node

node

text

element

processing-instruction

typeswitch

empty-sequence

  

The Parameter List

The syntax of a parameter list is shown in Figure 8-3. Each parameter has a unique name, and optionally a type. The name is expressed as a variable name, preceded by a dollar sign ($). When a function is called, the variable specified is bound to the value that is passed to it. For example, the function declaration:

declare function local:getProdNum ($prod as element()) as element( )
   { $prod/number };

binds the $prod variable to the value of the argument passed to it. The $prod variable can be referenced anywhere in the function body.

Syntax of a parameter list

Figure 8-3. Syntax of a parameter list

The type is expressed as a sequence type, described earlier in this chapter. If no type is specified for a particular parameter, it allows any argument. However, it is best to specify a type, for the purposes of error checking and clarity. Some of the built-in functions use the keyword numeric to indicate that the argument may be of any numeric type. This keyword cannot be used in user-defined functions.

When the function is called, each argument value is converted to the appropriate type according to the function conversion rules.

Accepting arguments that are nodes versus atomic values

You may be faced with the decision of whether to accept a node that contains an atomic value, or to accept the atomic value itself. For example, in the declaration of local:discountPrice, you could have accepted the price and discountPct element instead of accepting their xs:decimal and xs:integer values. There are some cases where it is advantageous to pass the entire element as an argument, such as if:

  • You want to access its attributes—for example, to access the currency attribute of price

  • You need to access its parent or siblings

However, if you are interested in only a single data value, there are a number of reasons why it is generally better to accept the atomic value:

  • It is more flexible, in that you can pass a node to a function that expects an atomic value, but you cannot pass an atomic value to a function that expects a node.

  • You can be more specific about the desired type of the value, to ensure, for example, that it is an xs:integer.

  • You don't have to cast untyped values to the desired type; this will happen automatically as part of the conversion.

Accepting arguments that are the empty sequence

You may have noticed that many of the XQuery built-in functions accept the empty sequence as arguments, as evidenced by the occurrence indicators * and ?. For example, the substring function accepts the empty sequence for its first argument and returns the empty sequence if the empty sequence is passed to it. This is a flexible way of handling optional elements. If you want to take a substring of an optional number child, if it exists, you can simply specify:

substring ($product/number, 1, 5)

If the substring function were less flexible, and did not accept the empty sequence, you would be required to write:

if    ($product/number)
then  substring ($product/number, 1, 5)
else  ( )

This can become quite cumbersome if you are nesting many function calls. Generally, your functions should be designed to be easily nested in this way as well.

It is also important to decide how you want the function to handle arguments that are the empty sequence, if they are allowed. In some cases, it is not appropriate simply to return the empty sequence. Using the local:discountPrice function from Example 8-1, suppose $discount is bound to the empty sequence, because $prod has no discount child. The function returns the empty sequence because all arithmetic operations on the empty sequence return the empty sequence.

It is more likely that you want the function to return the original price if no discount amount is provided. Example 8-2 shows a revised function declaration where special checking is done for the case where either $discount or $maxDiscountPct is the empty sequence.

Example 8-2. Handling the empty sequence

declare function local:discountPrice(
  $price as xs:decimal?,
  $discount as xs:decimal?,
  $maxDiscountPct as xs:integer?) as xs:double?
{
   let $newDiscount    := if ($discount) then $discount else 0
   let $maxDiscount    := if ($maxDiscountPct)
                          then ($price * $maxDiscountPct) div 100
                          else 0
   let $actualDiscount := min(($maxDiscount, $discount))
   return ($price - $actualDiscount)
};
local:discountPrice($prod/price, $prod/discount, 15)

Functions and Context

Inside a function body, there is no context item, even if there is one in the part of the query that contained the function call. For example, the function shown in Example 8-3 is designed to return all the products with numbers whose second digit is greater than 5. You might think that because the function is called in an expression where the context is the product element, the function can use the simple expression number to access the number child of that product. However, since the function does not inherit the context item from the main body of the query, the processor does not have a context in which to evaluate number.

Example 8-3. Invalid use of context in a function body

declare function local:prod2ndDigit( ) as xs:string? {
    substring(number, 2, 1)
};
doc("catalog.xml")//product[local:prod2ndDigit( ) > '5']

Instead, the relevant node must be passed to the function as an argument. Example 8-4 shows a revised function that correctly accepts the desired product element as an argument and uses a path expression ($prod/number) to find the number child. The product element is passed to the function using a period (.), shorthand for the context item.

Example 8-4. Passing the context item to the function

declare function local:prod2ndDigit($prod as element( )?) as xs:string? {
    substring($prod/number, 2, 1)
};
doc("catalog.xml")//product[local:prod2ndDigit(.) > '5']

Recursive Functions

Functions can recursively call themselves. For example, suppose you want to count the number of descendant elements of an element (not just the immediate children, but all the descendants). You could accomplish this using the function shown in Example 8-5.

Example 8-5. A recursive function

declare namespace functx = "http://www.functx.com";
declare function functx:num-descendant-elements
  ($el as element( )) as xs:integer {
    sum(for $child in $el/*
        return functx:num-descendant-elements($child) + 1)
};

The functx:num-descendant-elements function recursively calls itself to determine how many element children the element has, how many children its children has, and so on. The only caveat is that there must be a level at which the function stops calling itself. In this case, it will eventually reach an element that has no children, so the return clause will not be evaluated. On the other hand, declaring a function such as:

declare function local:addItUp () { 1 + local:addItUp( ) };

results in an infinite loop, which will possibly end with an "out of memory" or "stack full" error.

You can also declare mutually recursive functions that call each other. Chapter 9 explores the use of recursive functions for making modifications to element structures.



[a] The syntax of < param-list > is shown in Figure 8-3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.68