Chapter 4. Functions and Modules

Introduction

In any programming language, functions are one of the most powerful tools available to a developer. XQuery provides both a large built-in function library and the ability to create your own functions. XQuery also allows implementations to interoperate with externally defined functions from other languages.

Every XQuery function has a qualified name, zero or more typed arguments, and a return type. Together, these constitute a function's signature. User-defined functions also have a function body (an XQuery expression that defines the function's behavior).

This chapter finishes laying the foundation of XQuery by describing how to invoke built-in and external functions and how to create and invoke user-defined functions.

Built-in Function Library

XQuery 1.0 defines over one hundred built-in functions, all summarized in Appendix C. Some of these functions come from XPath 1.0, but most are new to XQuery. They range in functionality from the simple string-length() to complex regular expression handlers like matches().

Every built-in function resides in the namespace http://www.w3.org/2003/11/xpath-functions, which is bound to the predefined namespace prefix fn. Because this is also the default function namespace in XQuery, this prefix is generally omitted from built-in function names. For example, the built-in count() function takes one sequence argument and computes its length. Its signature is fn:count($seq as item()*) as xs:integer. This means that its name is fn:count, it takes one argument, $seq, which is a sequence of zero or more items, and it returns an xs:integer.

Some built-in functions are overloaded, meaning that there are several functions with the same name but different arguments and/or return types. One built-in function, concat(), is special in that it takes any number of arguments. In contrast, user-defined functions are never overloaded, and always take a fixed number of arguments.

For example, the built-in function starts-with() comes in two forms:

fn:starts-with($str as xs:string?, $sub as xs:string?)
               as xs:boolean?

and

fn:starts-with($str as xs:string?, $sub as xs:string?,
               $collation as xs:string) as xs:boolean?

The first form takes two arguments, both of which are sequences containing zero or one xs:string values. It returns a sequence containing zero or one xs:boolean values. The second, overloaded form takes a third argument, $collation, which must be a singleton xs:string value.

Colloquially, we would say that the starts-with() function takes two arguments and an optional third argument, although in reality these are separate overloaded functions.

Function Invocation

Functions are invoked by name, passing a comma-separated list of parameter values, one value for each argument required by the function. For example, the starts-with() function can be invoked as shown in Listing 4.1

Example 4.1. Passing more than one argument

starts-with("abracadabra", "abra") => true

This parameter list isn't itself a sequence because XQuery doesn't allow nested sequences, and the individual parameters can themselves be sequences. For example, the count() function can be invoked as shown in Listing 4.2. This invocation passes a sequence of values as the single argument to count().

Example 4.2. Passing a sequence of values as a single argument

count((1, 4, 9)) => 3

The empty sequence can also be passed to functions that accept it. For example, count(()) returns the integer 0, and starts-with((), ()) returns the empty sequence.

If the name used in the function invocation is not prefixed, then the default function namespace is used. This is usually the built-in function namespace; however, you can choose a different default namespace by using the default function namespace declaration in the query prolog, as shown in Listing 4.3. The built-in function namespace is still available through the prefix fn.

Example 4.3. Changing the default function namespace

declare default function namespace "urn:my-functions";
count((1, 2, 3))     (:invokes your count function :)
fn:count((1, 2, 3))  (: invokes the built-in count function :)

XQuery 1.0 always determines the function to be invoked by comparing names and number of arguments (the arity). If there isn't a function with that name and arity, then an error is raised.

To evaluate a function invocation, XQuery first evaluates each of the parameter expressions passed to the function, in any order. If an implementation can determine that a parameter isn't used, then it's allowed to not evaluate it (for optimization purposes).

A complication arises when the types of the parameters passed to the function differ from the types of arguments it expects. For example, what should XQuery do with starts-with(123, 1), or, for that matter, starts-with(xs:token("xyz"), "x")? To resolve this conundrum, XQuery implicitly applies function conversion rules (described next) to each of the parameters.

Function Conversion Rules

When the expected type of an argument is an atomic type (or sequence of atomic types), then the corresponding parameter passed to the function is first atomized, producing a sequence of zero or more atomic values. Each xdt:untypedAtomic value in this sequence is cast to the expected atomic type. Each numeric value that can be promoted to the expected type is promoted. This atomized/promoted/cast value becomes the parameter value passed to the function.

Then, whether the expected type of the argument is atomic or not, the parameter value is matched against the expected argument type using sequence type matching. If the types match—which includes the possibility of subtype substitution—then the function invocation succeeds with these values; otherwise, a type error is raised. (The atomization, numeric type promotion, subtype substitution, and sequence type matching rules are all defined in Chapter 2.)

The function conversion rules may seem complex, but in practice functions mostly do what you expect. Returning to the examples of the previous section, starts-with(123, 1) invokes a function that expects two atomic arguments. So, using the rules above, first each argument is atomized (both are already atomic values), and type promotion doesn't apply (because the expected arguments are non-numeric). Sequence type matching then fails because the parameters have type xs:integer, but starts-with() expects xs:string?, and these types don't match.

In contrast, starts-with(xs:token("xyz"), "x") succeeds because sequence type matching accepts the first argument (xs:token is a subtype of xs:string) and also the second argument (xs:string matches the expected type xs:string?).

User-Defined Functions

Although XQuery's built-in function library is vast, you will quickly find yourself wanting to write your own XQuery functions. Many of the examples in this book involve user-defined functions.

User-defined functions (aka UDFs) are declared after the query prolog but before the main part of the query, using the declare function expression. This expression takes the function signature, followed by an expression enclosed in curly braces ({}) that defines the body of the function, and ends with a semicolon. If the declared function name doesn't use a prefix, then it doesn't have a namespace (not even the default function namespace).

If the return type of the function isn't specified, then it defaults to item()* (the most generic type possible). Similarly, if any of the parameter types are omitted, then they also default to item()*. The parameters must have distinct names, and are in scope for the entire function body. The type of the function body expression must match, according to sequence type matching, the declared return type for the function.

For example, Listing 4.4 shows a trivial user-defined function that takes no parameters and returns the empty sequence. The return type of the function is declared to be empty().

Example 4.4. A very simple user-defined function

declare function empty-sequence() as empty() {
  ()
};

Listing 4.5 illustrates a more interesting user-defined function. This function computes the absolute value of an integer passed to it. It takes one integer argument ($i), and returns an integer that is the absolute value of the argument.

Example 4.5. A slightly more interesting user-defined function

declare function abs($i as xs:integer) as xs:integer {
  if ($i < 0) then -$i else $i
};

As another example, consider the distance() function in Listing 4.6, which takes a three-dimensional point (expressed as x, y, and z coordinates) and returns the square of its distance from the origin.

Example 4.6. A function with three parameters

declare function distance($x as xs:double,
                         $y as xs:double,
                         $z as xs:double) as xs:double {
  $x*$x + $y*$y + $z*$z
};

User-defined functions are one of the few places in XQuery where some of the many derived types, such as xs:positiveInteger, can be useful. Derived types can be used to constrain the arguments passed to a function, without having to write any code to enforce the constraint yourself. For example, the abs() function in Listing 4.5 could return an xs:nonNegativeInteger, as shown in Listing 4.7.

Example 4.7. Derived types can be useful to constrain functions

declare function abs($i as xs:integer) as xs:nonNegativeInteger {
  (if ($i < 0) then -$i else $i) cast as xs:nonNegativeInteger
};

In this way, derived types act as a kind of assertion mechanism, verifying that the arguments passed to the function and the value it returns satisfy all appropriate constraints. However, unlike asserts—which are commonly used only during design time—type conversions affect the meaning and performance of a query, so use them wisely.

Recursion

XQuery also allows user-defined functions to invoke themselves (recursion). Recursion is commonly used when processing XML, due to its tree-like nature. For example, the function in Listing 4.8 recursively computes all the ancestors of a node.

Example 4.8. Recursively computing the ancestors of a node

declare function ancestors-or-self($n as node()?) {
  if (empty($n))
  then ()
  else (ancestors-or-self($n/..), $n)
};

Recursive functions are especially useful in XQuery because XQuery cannot change the value of an assigned variable, so certain iterative approaches cannot be implemented easily or efficiently.

For example, consider writing your own pow() function, which computes an integer raised to some power. In some programming languages you might write a loop that iteratively accumulates the result. In XQuery, this computation is most easily expressed using recursion, as in Listing 4.9.

Example 4.9. Recursively computing integer powers

declare function pow($b as xs:integer, $exp as $xs:integer)
                                                   as xs:integer {
  if ($exp > 0)
  then $b * pow($b, $exp – 1)
  else 1
};

pow(2, 3)  => 8
pow(2, 16) => 65536

Both of these examples used a conditional if/then/else to guard against infinite recursion. For additional examples of recursion, such as a deep-copy() function, see Chapter 10.

External Functions

Some implementations support externally defined functions (and parameters). The mechanism by which this is done varies from one implementation to the next. XQuery itself provides only the syntax to use when declaring these externals, shown in Listing 4.10. Essentially, they are like ordinary user-defined functions except that the external keyword is used instead of a function body.

Example 4.10. Some implementations support external functions and variables

declare function foo($param as xs:integer) as xs:string external;
declare variable $var as xs:decimal external;

Modules

XQuery allows queries to be organized into modules. Most XQuery programs consist of a single module, the main module, but larger programs may have additional library modules.

A library module must contain a module declaration in its query prolog, and no query body (only user-defined functions and global variables). The module declaration associates a target namespace with the module, which is used to identify the module again later for import.

The target namespace also becomes the default function namespace for the module (unless you specify a different default using the default function namespace declaration in the prolog). The name of every function and global variable in the module must have the same namespace as the target namespace, as shown in Listing 4.11.

Example 4.11. A sample library module

module namespace my = "urn:my-functions";
declare function my:one() { 1 };
declare function my:two() { 2 };

Any module may import library modules using the import module statement in the query prolog. For example, Listing 4.12 imports all of the user-defined functions and global variables that were defined in the library module that is Listing 4.11.

Example 4.12. Using the library module defined in Listing 4.11

import module namesepace my = "urn:my-functions";

my:one() + my:two()

Modules are especially convenient when writing lots of user-defined functions. You can group related functions into one namespace, and put them into their own module, to be reused over and over again by other queries you write. However, support for modules varies from one implementation to another; implementations aren't required to support them at all. For more information about the module declaration and import module statements, see the Query Prolog section at the end of Chapter 5.

Conclusion

Each user-defined function has a qualified name, a return type, and zero or more typed parameters. When invoking a function, both its parameter values and its return value are implicitly converted, if necessary, using the function conversion rules.

XQuery provides a large built-in function library and also empowers you to write your own functions. User-defined functions cannot overload each other or existing built-in functions. XQuery also allows implementations to provide access to externally defined functions, which are declared in the query with XQuery types.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.139.50