Chapter 8. Functions

Functions are a useful feature of XQuery that allow a wide array of built-in functionality, as well as the ability to modularize and reuse parts of queries. There are two kinds of functions: built-in functions and user-defined functions.

Built-in Versus User-Defined Functions

The built-in functions are a standard set supported by all XQuery implementations. A detailed description of each built-in function is provided in Appendix A, and most are also discussed at appropriate places in the book.

A user-defined function is one that is specified by a query author, either in the query itself, or in an external library. The second half of this chapter explains how to define your own functions in detail.

Calling Functions

The syntax of a function call, shown in Figure 8-1, is the same whether it is a built-in function or a user-defined function. It is the qualified name of the function, followed by a parenthesized list of the arguments, separated by commas. An argument is the actual value that is passed to a function, while a parameter is its definition. For example, to call the substring function, you might use:

substring($prodName, 1, 5)
Figure 8-1. Syntax of a function call

Function calls can be included anywhere an expression is permitted. For example, you might include a function call in a let clause, as in:

let $name := substring($prodName, 1, 5)

or in element constructor content:

<name>{substring($prodName, 1, 5)}</name>

or in the predicate of a path expression:

doc("catalog.xml")/catalog/product[substring(name, 1, 6) = 'Fleece']

Function Names

Functions have namespace-qualified names. Most of the built-in function names are in the XPath Functions Namespace, http://www.w3.org/2005/xpath-functions. Since this is the default namespace for functions, these built-in functions can be referenced without a namespace prefix (unless you have overridden the default function namespace, which is not recommended). Some XQuery users still prefer to use the fn prefix for these functions, but this is normally unnecessary.

A number of built-in functions that were introduced in versions 3.0 and 3.1 are in namespaces commonly associated with the prefixes math, map, and array. These function names need to be prefixed when called, and the appropriate namespaces need to be declared, as shown in the following query, which declares the math namespace and calls the math:exp function with the declared prefix:

declare namespace math = "http://www.w3.org/2005/xpath-functions/math";
math:exp(12)

If a function is user-defined, it must be called by its prefixed name. If a function is declared in the same query module, you can call it by using the same prefixed name found in the declaration. Some functions may use the local prefix, a built-in prefix for locally declared functions. To call these functions, you use the local prefix in the name, as in:

declare function local:return2 () as xs:integer {2};
<size>{local:return2()}</size>

If the function is in a separate library, it may have a different namespace that needs to be declared. For example, if you are calling a function named trim in the namespace http://datypic.com/strings, you must declare that namespace and use the appropriate prefix when calling the function, as in:

import module namespace strings = "http://datypic.com/strings"
                     at "strings.xqm";
for $prod in doc("catalog.xml")//product
return strings:trim($prod/name)

Function Signatures

A function signature is used to describe the inputs and outputs of a function. For example, the signature of the built-in upper-case function is:

upper-case($arg as xs:string?) as xs:string

The signature indicates:

  • The name of the function, in this case, upper-case.

  • The list of parameters. In this case, there is only one, whose name is $arg and whose type is xs:string?. The question mark after xs:string indicates that the function accepts a single xs:string value or the empty sequence.

  • The return type of the function, in this case, xs:string.

There may be several signatures associated with the same function name, with a different number of parameters (arity). For example, there are two signatures for the substring function:

substring($sourceString as xs:string?,
          $start as xs:double) as xs:string
substring($sourceString as xs:string?,
          $start as xs:double,
          $length as xs:double) as xs:string

The second signature has one additional parameter, $length.

Argument Lists

When calling a function, there must be an argument for every parameter specified in the function signature. If there is more than one signature, as in the case of the substring function, the argument list may match either function signature. If the function does not take any arguments, the parentheses are still required, although there is nothing between them, as in:

current-date()

You are not limited to simple variable names and literals in a function call. You can have complex, nested expressions that are evaluated before evaluation of the function. For example, the following function call has one argument that is itself a function call, and another argument that is a conditional (if) expression:

concat(substring($name, 1, $sublen), if ($addT) then "T" else "")

Calling a function never changes the value of any of the variables that are passed to it. In the preceding example, the value of $name does not change during evaluation of the substring function.

Argument lists and the empty sequence

Passing the empty sequence or a zero-length string for an argument is not the same as omitting an argument. For example:

substring($myString, 2)

is not the same as:

substring($myString, 2, ())

The first function call matches the first signature of substring, and therefore returns a substring of $myString starting at position 2. The second matches the second signature of substring, which takes three arguments. This function call raises type error XPTY0004 because the third argument of the substring function must be an xs:double value, and cannot be the empty sequence.

Conversely, if an argument can be the empty sequence, this does not mean it can be omitted. For example, the upper-case function expects one argument, which can be the empty sequence. It is not acceptable to use upper-case(), although it is acceptable to use upper-case( () ), because the inner parentheses () represent the empty sequence.

Argument lists and sequences

The syntax of an argument list is similar to the syntax of a sequence constructor, and it is important not to confuse the two. Each expression in the argument list (separated by a comma) is considered a single argument. A sequence passed to a function is considered a single argument, not a list of arguments. Some functions expect sequences as arguments. For example, the max function, whose one-argument signature is:

max($arg as xs:anyAtomicType*) as xs:anyAtomicType?

expects one argument that is a sequence. Therefore, an appropriate call to max is:

max ( (1, 2, 3) )

not:

max (1, 2, 3)

which is attempting to pass it three arguments.

Conversely, it is not acceptable to pass a sequence to a function that expects several arguments that are atomic values. For example, in:

substring( ($myString, 2) )

the argument list contains only one argument, which happens to be a sequence of two items, because of the extra parentheses. This raises type error XPTY0004 because the function expects two (or three) arguments.

You may want to pass a sequence of multiple items to a function to apply the function to each of those items. For example, to take the substring of each of the product names, you might be tempted to write:

substring( doc("catalog.xml")//name, 1, 3)

but this won’t work because the first argument of substring is not allowed to contain more than one item. Instead, you could use a path expression, as in:

doc("catalog.xml")//name/substring(., 1, 3 )

which will return a sequence of four strings: Fle, Flo, Del, and Cot.

Sequence Types

The types of parameters are expressed as sequence types, which specify both the number and type (and/or node kind) of items that make up the parameter. The most commonly used sequence types are the name of a specific atomic type, such as xs:integer, xs:double, xs:date, or xs:string. The sequence type xs:anyAtomicType, which allows any atomic value, or xs:numeric, which allows any number, can also be specified.

Occurrence indicators are used to indicate how many items can be in a sequence. The occurrence indicators are:

  • ? For zero or one items

  • * For zero, one, or more items

  • + For one or more items

If no occurrence indicator is specified, it is assumed that it means one and only one. For example, a sequence type of xs:integer matches one and only one atomic value of type xs:integer. A sequence type of xs:string* matches a sequence that is either the empty sequence, or contains one or more atomic values of type xs:string. Sequence types are covered in detail in “Sequence Types”.

Remember that there is no difference between an item, and a sequence that contains only that item. If a function expects xs:string* (a sequence of zero to many strings), it is perfectly acceptable to pass it a single string such as "xyz".

When you call a function, sometimes the type of an argument differs from the type specified in the function signature. For example, you may pass an xs:integer to a function that expects an xs:decimal. Alternatively, you may pass an element that contains a string to a function that expects just the string itself. XQuery defines rules, known as function conversion rules, for converting arguments to the expected type. The function conversion rules are covered in detail in “Function Conversion Rules”.

Not all arguments can be converted using the function conversion rules, because function conversion does not involve straight casting from one type to another. For example, you cannot pass a string to a function that expects an integer. If you attempt to pass an argument that does not match the sequence type specified in the function signature, type error XPTY0004 is raised.

Calling Functions with the Arrow Operator

The arrow operator (=>), introduced in version 3.1, allows another syntax for calling functions. For example, instead of the function call upper-case('abc'), you can specify 'abc'=>upper-case(). This means that the upper-case function should be applied to the item to the left of the operator, in this case the string abc.

If a function takes more than one argument, these additional arguments are moved up a position in the function call. For example, instead of substring('abc', 1, 2), you can specify 'abc'=>substring(1, 2), where 'abc' is the first argument, 1 is the second argument, and 2 is the third argument.

The arrow operator is especially useful for chaining together multiple function calls. For example, instead of:

tokenize(normalize-space(replace($string, 'a', 'b')), "s")

it is much clearer to say:

$string=>replace('a',  'b')=>normalize-space()=>tokenize("s")

The expression to the left of the arrow operator can return a sequence of multiple items. In this case, the function is called once using the entire sequence as its first argument, as opposed to calling the function once per item in the sequence.

User-Defined Functions

XQuery allows you to create your own functions. This allows query fragments to be reused, and allows code libraries to be developed and reused by other parties. User-defined functions can also make a query more readable by separating out expressions and naming them. For a starter set of user-defined function examples, see the FunctX library at http://www.xqueryfunctions.com.

Why Define Your Own Functions?

There are many good reasons for user-defined functions, such as:

Reuse

If you are evaluating the same expression repeatedly, it makes sense to define it as a separate function, and then call it from multiple places. This has the advantage of being written (and maintained) only once. If you want to change the algorithm later—for example, to accept the empty sequence or to fix a bug—you can do it only in one place.

Clarity

Functions make it clearer to the query reader what is going on. Having a function clearly named, with a set of named, typed parameters, serves as a form of documentation. It also physically separates it from the rest of the query, which makes it easier to decipher complex queries with many nested expressions.

Recursion

It is virtually impossible to implement some algorithms without recursion. For example, if you want to generate a table of contents based on section headers, you can write a recursive function that processes section elements, their children, their grandchildren, and so on.

Managing change

By encapsulating functionality such as “get all the orders for a product” into a user-defined function, applications become easier to adapt to subsequent schema changes.

Automatic type conversions

The function conversion rules automatically perform some type promotions, casting, and atomization. These type conversions can be performed explicitly in the query, but sometimes it is cleaner simply to call a function.

Function Declarations

Functions are defined using function declarations, which can appear either in the query prolog or in an external library. Example 8-1 shows a function declaration in a query prolog. The function, called local:discountPrice, accepts three arguments: a price, a discount, and a maximum discount percent. It applies the lesser of the discount and the maximum discount to the price. The last two lines in the example are the query body, which call to the local:discountPrice function.

Example 8-1. A function declaration
declare function local:discountPrice(
  $price as xs:decimal?,
  $discount as xs:decimal?,
  $maxDiscountPct as xs:integer?) as xs:decimal?

{
   let $maxDiscount := ($price * $maxDiscountPct) div 100
   let $actualDiscount := min( ($maxDiscount, $discount) )
   return ($price - $actualDiscount)
};

let $prod := doc("prices.xml")//prod[1]
return local:discountPrice($prod/price, $prod/discount, 15)

The syntax of a function declaration is shown in Figure 8-2. As you can see, a function declaration consists of several parts:

  • The keyword declare

  • An optional %public or %private annotation, described in “Private Functions and Variables”. Other annotations are also allowed here, as described in “Annotations”.

  • The keyword function followed by the qualified function name

  • A list of parameters enclosed in parentheses and separated by commas. The parameter list is optional, although the parentheses around the parameter list are required.

  • An optional as clause, which declares the return type of the function. This is optional, but it is strongly encouraged.

  • A function body enclosed in curly braces and followed by a semicolon

Figure 8-2. Syntax of a function declaration

The Function Body

The function body is an expression enclosed in curly braces, which may contain any valid XQuery expressions, including FLWORs, path expressions, or any other XQuery expression. It does not have to contain a return clause; the return value is simply the value of the expression. You could have a function declaration as minimal as:

declare function local:get-2() {2};

Within a function body, a function can call other functions that are declared anywhere in the module, or in an imported library module, regardless of the order of their declarations.

Once the function body has been evaluated, its value is converted to the return type by using the function conversion rules described in “Function Conversion Rules”. If the return type is not specified, it is assumed to be item()*, that is, a possibly empty sequence of items of any kind.

The Function Name

Each function is uniquely identified by its qualified name and its arity (number of parameters). There can be more than one function declaration that has the same qualified name, as long as the arity is different. The function name must be a valid XML name, meaning that it can start with a letter or underscore and contain letters, digits, underscores, hyphens, and periods. Like other XML names, function names are case-sensitive.

All user-defined function names must be in a namespace. In the main query module, you can use any prefix that is declared in the prolog. You can also use the predeclared prefix local, which puts the function in the namespace http://www.w3.org/2005/xquery-local-functions. It can then be called from within that main module, using the prefix local. On the other hand, if a function is declared in a library module, its name must be in the target namespace of the module. Library modules are discussed in “Assembling Queries from Multiple Modules”.

Certain namespaces that are built into the specifications are reserved. It is not possible to define functions in the XML Schema namespace, for example, or any of the namespaces of the built-in functions. Attempting to do this results in error XQST0045.

In addition, certain function names are reserved; these are listed in Table 8-1. It is not an error to declare functions with these names, but when called they must be prefixed. As long as you have not overridden the default function namespace, this is not an issue. However, for clarity, it is best to avoid these function names.

Table 8-1. Reserved function names
arrayfunctionprocessing-instruction
attributeifschema-attribute
commentitemschema-element
document-nodemapswitch
elementnamespace-nodetext
empty-sequencenodetypeswitch

The Parameter List

The syntax of a parameter list is shown in Figure 8-3. Each parameter has a unique name, and optionally a type. The name is expressed as a variable name, preceded by a dollar sign ($). When a function is called, the variable specified is bound to the value that is passed to it. For example, the function declaration:

declare function local:getProdNum ($prod as element()) as element()
   { $prod/number };

binds the $prod variable to the value of the argument passed to it. The $prod variable can be referenced anywhere in the function body.

Figure 8-3. Syntax of a parameter list

The type is expressed as a sequence type, described earlier in this chapter. If no type is specified for a particular parameter, it allows any argument. However, it is best to specify a type for the purposes of error checking and clarity.

When the function is called, each argument value is converted to the appropriate type according to the function conversion rules.

Accepting arguments that are nodes versus atomic values

You may be faced with the decision of whether to accept a node that contains an atomic value, or to accept the atomic value itself. For example, in the declaration of local:discountPrice, you could have accepted the price and discountPct element instead of accepting their xs:decimal and xs:integer values. In some cases, it is advantageous to pass the entire element as an argument, such as if:

  • You want to access its attributes—for example, to access the currency attribute of price

  • You need to access its parent or siblings

However, if you are interested in only a single data value, there are a number of reasons why it is generally better to accept the atomic value:

  • It is more flexible, in that you can pass a node to a function that expects an atomic value, but you cannot pass an atomic value to a function that expects a node.

  • You can be more specific about the desired type of the value, to ensure, for example, that it is an xs:integer.

  • You don’t have to cast untyped values to the desired type; this will happen automatically as part of the conversion.

Accepting arguments that are the empty sequence

You may have noticed that many of the XQuery built-in functions accept the empty sequence as arguments, as evidenced by the occurrence indicators * and ?. For example, the substring function accepts the empty sequence for its first argument and returns a zero-length string if the empty sequence is passed to it. This is a flexible way of handling optional elements. If you want to take a substring of an optional number child, if it exists, you can simply specify:

substring($prod/number, 1, 5)

If the substring function were less flexible, and did not accept the empty sequence, you would be required to write:

if    ($prod/number)
then  substring ($prod/number, 1, 5)
else  ""

This can become quite cumbersome if you are nesting many function calls. Generally, your functions should be designed to be easily nested in this way as well.

It is also important to decide how you want the function to handle arguments that are the empty sequence, if they are allowed. In some cases, it is not appropriate simply to return the empty sequence. Using the local:discountPrice function from Example 8-1, suppose $discount is bound to the empty sequence because $prod has no discount child. The function returns the empty sequence because all arithmetic operations on the empty sequence return the empty sequence.

It is more likely that you want the function to return the original price if no discount amount is provided. Example 8-2 shows a revised function declaration where special checking is done for the case where either $discount or $maxDiscountPct is the empty sequence.

Example 8-2. Handling the empty sequence
declare function local:discountPrice(
  $price as xs:decimal?,
  $discount as xs:decimal?,
  $maxDiscountPct as xs:integer?) as xs:double?
{
   let $newDiscount    := if ($discount) then $discount else 0
   let $maxDiscount    := if ($maxDiscountPct)
                          then ($price * $maxDiscountPct) div 100
                          else 0
   let $actualDiscount := min( ($maxDiscount, $newDiscount) )
   return ($price - $actualDiscount)
};
let $prod := doc("prices.xml")//prod[1]
return local:discountPrice($prod/price, $prod/discount, 15)

Functions and Context

Inside a function body, there is no context item, even if there is one in the part of the query that contained the function call. For example, the function shown in Example 8-3 is designed to return all the products with numbers whose second digit is greater than 5. You might think that because the function is called in an expression where the context is the product element, the function can use the simple expression number to access the number child of that product. However, because the function does not inherit the context item from the main body of the query, the processor does not have a context in which to evaluate number.

Example 8-3. Invalid use of context in a function body
declare function local:prod2ndDigit() as xs:string? {
    substring(number, 2, 1)
};
doc("catalog.xml")//product[local:prod2ndDigit() > '5']

Instead, the relevant node must be passed to the function as an argument. Example 8-4 shows a revised function that correctly accepts the desired product element as an argument and uses a path expression ($prod/number) to find the number child. The product element is passed to the function, using a period (.), shorthand for the context item.

Example 8-4. Passing the context item to the function
declare function local:prod2ndDigit($prod as element()?) as xs:string? {
    substring($prod/number, 2, 1)
};
doc("catalog.xml")//product[local:prod2ndDigit(.) > '5']

Recursive Functions

Functions can recursively call themselves. For example, suppose you want to count the number of descendant elements of an element (not just the immediate children, but all the descendants). You could accomplish this by using the function shown in Example 8-5.

Example 8-5. A recursive function
declare function local:num-descendant-elements
  ($el as element()) as xs:integer {
    sum(for $child in $el/*
        return local:num-descendant-elements($child) + 1)
};

The local:num-descendant-elements function recursively calls itself to determine how many element children the element has, how many children its children has, and so on. The only caveat is that there must be a level at which the function stops calling itself. In this case, it will eventually reach an element that has no children, so the return clause will not be evaluated. On the other hand, declaring a function such as:

declare function local:addItUp () { 1 + local:addItUp() };

results in an infinite loop, which will possibly end with an “out of memory” or “stack overflow” error. Even if it is not an infinite loop, processors will have a limit to how many recursive function calls are supported. If you can write your recursive functions as tail-recursive, then most processors will optimize those functions so that they will not overflow the stack.

You can also declare mutually recursive functions that call each other. “Copying Input Elements with Modifications” explores the use of recursive functions for making modifications to element structures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.144.32