Chapter 23. Function Items and Higher-Order Functions

Starting in version 3.0, the XQuery data model treats functions as full-fledged items in the data model, where previously only nodes and atomic values had this status. This opens up a lot of new functionality, allowing for example functions to be bound to variables, or passed as arguments to other functions. This chapter explores the possibilities allowed by function items and the associated new XQuery syntax.

All of the functionality described in this chapter is only supported by implementations that implement the optional Higher Order Function Feature.

Why Higher-Order Functions?

The inclusion of functions as items into the data model allows them to be passed to and from other functions. This enables higher-order functions, which are functions that take other functions as parameters, and/or return other functions as their result. For more complex applications, higher-order functions can significantly simplify XQuery code and make it more flexible.

Higher-order functions are particularly important in XQuery because there is no other convenient way of parameterizing the code that gets executed. There is no dynamic dispatch based on the type of an object, as in object-oriented languages, and there is no dynamic selection of template rules based on pattern matching as in XSLT. If you want to write a function that does something generic, like visiting all the nodes on the tree and doing some action on each one, and you don’t want to hard-code the list of possible actions, then higher-order functions are the way to achieve it.

For example, the built-in sort higher-order function allows you to dynamically sort a sequence by providing a function that generates the sort key. The filter higher-order function allows you to dynamically filter a sequence by providing a function that returns a Boolean value indicating whether an item should be filtered out. A Web application that is displaying product information could allow a user to choose what products they want to see (based on some filtering criteria), and in what order they want the products to appear. While it would be possible to write a lengthy query that hard-codes all the possible filtering and sorting combinations, using higher-order functions makes such a query much more flexible and streamlined.

Constructing Functions and Calling Them Dynamically

In order to understand how to use the built-in higher-order functions, and to write your own, it is first necessary to learn the syntax used to create a function item, and to call it. A function item is simply a function that is being treated like an item in the data model. As with other items in the data model, function items can be constructed in several ways. Once they are constructed, they can then be dynamically called.

Named Function References

A named function reference can be used to create a function item based on the function name and arity. It consists of the name of the function, followed by #, followed by an integer that represents the arity (the number of arguments it takes). For example, substring#2 is a named function reference that refers to the two-argument version of the built-in substring function. The arity is required to uniquely identify the function, since it is possible to have two named functions with the same name, as long as their arity is different.

The result of a named function reference is a function item that can then be called dynamically later. For example, you might bind the function to a variable $f and then call it dynamically as follows:

let $f := upper-case#1
return $f('abc')

You can think of the $f variable as a pointer or reference to the function. It is called by placing an argument list in parentheses directly after it. This is known as a dynamic function call. This is in contrast to a static function call, where the argument list is preceded by a specific function name, for example upper-case('abc').

A named function reference can also refer to a user-defined function, for example local:concatNames#1 or functx:index-of-node#2. As with all names in XQuery, the names in named function references are namespace-sensitive. If the name is unprefixed, the default function namespace applies, which is appropriate for most of the built-in functions like upper-case.

If you specify a function that does not exist, or the arity is incorrect (for example upper-case#4), error XPST0017 is raised.

Using function-lookup to Obtain a Function

An alternative to using a named function reference is to use the built-in function-lookup function, which takes the name of the function and the arity as arguments.

let $f := function-lookup(xs:QName("fn:upper-case"), 1)
return if (exists($f)) then $f('abc') else 'N/A'

This works similarly to the named function reference but gives you more flexibility because you can check if a function exists before you attempt to call it. This is useful in some cases where you may have a complex, modular application where the set of functions that are available can vary, or if you need to check for the existence of an extension function implemented by a specific processor. The function-lookup function returns the empty sequence rather than raising an error if the function does not exist, so it is possible to test for that. It would still raise an error if you attempted to call $f if it is the empty sequence, which is why the example tests for the existence of $f before the dynamic call.

Inline Function Expressions

Another way to construct a new function item is through an inline function expression, which creates an anonymous function. Like a named function reference, an inline function expression creates a function item that you can then bind to a variable. For example:

let $subtract := function($a, $b) { $a - $b }
return $subtract(100, 30)

creates an inline function and binds it to the variable $subtract. The function is then called dynamically in the return clause, returning the value 70.

The syntax to define an inline function is similar to the syntax for user-defined functions, except that the function itself does not have a name (and the declare keyword is not used). It uses the keyword function, followed by a parameter list in parentheses, optional return type, and the function body in curly braces. Inline function expressions can have annotations, except for the %private annotation.

In the simple example above, there are no sequence types for the parameters or return type, but just like a user-defined function, they can be specified, as in:

let $subtract := 
   function($a as xs:integer, $b as xs:integer) as xs:integer { $a - $b }
return $subtract(100, 30)

There is an important difference between inline function expressions and named function definitions: inline functions have access to the context in which they are created, including any variables that may be in scope.

The combination of the anonymous function and its context is known as a closure in functional programming languages. Definitions of named functions, by contrast, have no context inside the function body. In the following example, the inline function has access to the $max-num variable. If a named function were used, the value of $max-num would have to be passed to the function as an argument.

let $max-num := 3
let $get-n-results := 
   function($seq as item()*) as item()* { $seq[position() <= $max-num] }
return $get-n-results( ("a", "b", "c", "d", "e") )

Inline function expressions are most useful for very simple functions, or if closure is needed, or if you wish to defer evaluation. They are also useful if your query is limited to XPath as opposed to XQuery, because XPath itself has no facilities for defining your own named functions. When using XQuery, especially if a longer, more complex function definition is required, it makes sense to define a function with a name that can be called from other contexts. It can still be called dynamically and used any way an inline function can.

Partial Function Application

A partial function application is when one or more question marks (?) are used as placeholders for arguments of a function. For example:

let $first-two-characters := substring(?, 1, 2)
return $first-two-characters('abc')

This binds a function item to the variable $first-two-characters with the second and third arguments being fixed. The function is then called in the return clause, returning the string ab. The processor knows to give $first-two-characters a function item as a value because of the placeholder. If no placeholders had been used, it would have been a normal static function call and a string result would have been bound to $first-two-characters.

It is possible to use placeholders for any or all of the arguments. In the following example, placeholders are used for both the first and third arguments:

let $prefix := substring(?, 1, ?)
return $prefix("abc", 2)

When calling the function, only the arguments with placeholders are specified. In the example, it is understood that "abc" corresponds to the first argument to substring, and 2 corresponds to the third argument, based on the positions of the placeholders. Using placeholders for all arguments, for example substring(?, ?, ?) is the same as using a named function reference like substring#3.

Partial function applications can also be used with dynamic functions. In the following example, the first let clause partially applies the built-in substring function, while the second let clause partially applies the function bound to the variable from the first let clause:

let $prefix := substring(?, 1, ?)
let $first-two-characters := $prefix(?, 2)
return $first-two-characters("abc")

The Arrow Operator and Dynamic Function Calls

The arrow operator, introduced in “Calling Functions with the Arrow Operator”, is especially useful for chaining together multiple function calls. It is possible to use the arrow operator to call functions dynamically as well. For example, the following query uses a partial function application with the arrow operator:

let $first-two-characters := substring(?, 1, 2)
return "abc"=>$first-two-characters()=> upper-case()

The following query uses an inline function expression with the arrow operator:

let $trim := function($arg) { replace($arg,'^s*(.+?)s*$','$1') }
return " abc "=>$trim()=> upper-case()

Syntax Recap

We have looked at a number of new ways to call functions in this section. For convenience, the various approaches are summarized in Table 23-1.

Table 23-1. Ways of calling a function
Call typeExampleChapter/Section
Static function call substring('abc', 1, 2) “Function Calls”
Named function reference let $f := substring#3 return $f('abc', 1, 2) “Named Function References”
Function lookup let $f := function-lookup(xs:QName("fn:substring"), 3) return $f('abc', 1, 2) “Using function-lookup to Obtain a Function”
Partial function application let $f := substring(?, 1, 2) return $f('abc') “Partial Function Application”
Inline function item let $f := function($arg as xs:string) { substring($arg, 1, 2) } return $f('abc') “Inline Function Expressions”
Using the arrow operator 'abc'=>substring(1, 2) “Calling Functions with the Arrow Operator” and “The Arrow Operator and Dynamic Function Calls”

Functions and Sequence Types

As with any item in the XQuery data model, a function item can be described by a sequence type, known as a function test. Sequence types, which are discussed in detail in “Sequence Types”, are used in a variety of XQuery expressions to indicate allowed values. The most common use is in the signatures of user-defined functions, but they are also used in other expressions such as instance of and typeswitch. The syntax for a function test is shown in Figure 23-1.

Figure 23-1. Syntax of a function test

The generic test function(*) can be used to match any function. A more specific test specifies the types of the parameters, and the return type. For example, function(element()*, xs:string) as xs:integer? matches a function that takes two arguments (the first is zero or more elements and the second is an atomic value of type xs:string) and returns zero or one xs:integer values. Note that within the parentheses there are only sequence types, no parameter names.

It is also possible to specify a sequence of multiple functions by using the standard occurrence indicators. In that case, you should surround the rest of the sequence type in parentheses, so that the occurrence indicator of the function sequence is not confused with the occurrence indicator of the return type. For example, (function(element()*, xs:string) as xs:integer?)* will match zero or more of the previously described functions, and the parentheses make it clear that the * refers to the sequence as a whole, whereas the ? applies to the xs:integer return type.

Function items also match the generic item() sequence type.

Higher-Order Functions

Now that you’ve seen the syntax for creating a function item and dynamically calling it, you can start to create and use higher-order functions. Example 23-1 shows a simple example of a higher-order function named local:for-each. It applies a function to every item in a sequence. The first argument ($seq) is the sequence to operate on, and the second argument ($f) is the function to apply.

Example 23-1. Simple higher order function
xquery version "3.0";
declare function local:for-each
   ($seq, 
    $f) 
   {
    for $item in $seq return $f($item)
   };
local:for-each( ("abc", "def"), upper-case#1)

When calling this function with upper-case#1 as the second argument, the result is the two strings ABC and DEF, because that is the result of applying upper-case to each string.

To make your code a bit more robust, you could add sequence types to your parameters, and a return type to the function, as shown in Example 23-2.

Example 23-2. Simple higher order function with sequence types
xquery version "3.0";
declare function local:for-each
   ($seq as item()*, 
    $f as function(item()) as item()*) 
   as item()* 
   {
    for $item in $seq return $f($item)
   };
local:for-each( ("abc", "def"), upper-case#1)

Because this is a very generic function, generic sequence types are used. The $seq parameter can be a sequence of any items. The $f function is only limited in that it only takes a single argument, and that argument must be one and only one item. It can return any sequence of items. The local:for-each function can also return any sequence of items, since it is just returning a concatenation of all the results of calls to the $f function.

Built-In Higher-Order Functions

The previous local:for-each example was useful for illustrative purposes, but there is already a built-in function called for-each that performs this same function, with this same signature. There are a number of built-in functions that are higher order functions, the most useful of which are described briefly in this section and in detail in Appendix A. Additional higher-order functions that work on maps and arrays are described in the next chapter.

filter

Returns the items for whom a supplied function returns true. For example, the following query returns (5, 6):

filter( (4, 5, 6), function($n) {$n > 4})
fold-left, fold-right

Apply a supplied function on a sequence, accumulating a result as it proceeds. For example, the following query returns abc:

fold-left( ("a", "b", "c"), "", concat(?, ?) )
for-each

Applies a supplied function to a sequence. For example, the following query returns ("A", "B", "C"):

for-each( ("a", "b", "c"), upper-case(?))
for-each-pair

Applies a supplied function to pairs of items taken from two sequences. For example, the following query returns ("ax", "by", "cz"):

for-each-pair( ("a", "b", "c"), ("x", "y", "z"), concat#2)
sort

Sorts a sequence, using a function to determine the sorting key. For example, the following query returns (-2, 4, -6):

sort( (-6, -2, 4), (), abs#1)

Writing Your Own Higher-Order Functions

The built-in higher-order functions show some of the utility of higher-order functions. They are most appropriate when flexibility is needed, or when applying different functions to the same structures over and over. Suppose you need a function that iterates over the prices in the prices.xml document and creates a price list in HTML. This is easy enough to do without any higher-order functions, as shown in Example 23-3.

This is pretty straightforward code but it is actually doing three separate things:

  • Traversing the prices document and finding the right priceList based on dates

  • Calculating the prices by subtracting the value of discount element from the value of the price

  • Structuring and formatting the output

Example 23-3. Price list generation without higher-order functions
xquery version "3.0";
declare function local:price-list
   ($price-doc as document-node()) {
    <html>
      <head>
        <title>Price List</title>
        <link href="companystandard.css"/>
      </head>
      <body>
        <h1>Price List</h1>
        <table>
          <tr>
            <th>Prod number</th>
            <th>Price</th>
          </tr>
          {for $prod in $price-doc//priceList[@effDate < current-date()]/prod
          return
          <tr>
            <td>{data($prod/@num)}</td>
            <td>{if (exists($prod/discount))
                 then $prod/price - $prod/discount
                 else data($prod/price)}</td>
            </tr>
        }</table>
      </body>
    </html>
};

Suppose someone in sales decides that for some customers, a different algorithm should be used for calculating discounts. For example, applying a 10% reduction to the price rather than using the discount elements. It is easy enough to copy the local:price-list function to create a new function named, for example, local:price-list-based-on-percents. However, there are only a few lines different between the two functions and you have unnecessarily repeated the rest of the code. You could modularize the code more, having separate local:page-header and local:table-header functions that are called from both price list functions, but you will still end up repeating a lot of code.

A cleaner approach is to pass the algorithm for calculating prices as a function to the local:price-list function, as shown in Example 23-4. This isolates the code that is different and keeps the local:price-list function as something that can be generically called with either price function, depending on the customer. It allows room for new, future pricing algorithms as they are invented by the sales team.

To call the function for customers that use the discount-based algorithm, you could use a named function reference, for example:

local:price-list(doc("prices.xml"),local:price-based-on-discount#1)
Example 23-4. Price list generation with function for discount algorithm
xquery version "3.0";
declare function local:price-list
   ($price-doc as document-node(),
    $price-calc-func as function(element(prod)) as xs:decimal) {
    <html>
      <head>
        <title>Price List</title>
        <link href="companystandard.css"/>
      </head>
      <body>
        <h1>Price List</h1>
        <table>
          <tr>
            <th>Prod number</th>
            <th>Price</th>
          </tr>
          {for $prod in $price-doc//priceList[@effDate < current-date()]/prod
          return
          <tr>
            <td>{data($prod/@num)}</td>
            <td>{$price-calc-func($prod)}</td>
          </tr>
        }</table>
      </body>
    </html>
};
declare function local:price-based-on-discount
  ($prod as element(prod)) as xs:decimal {
   if (exists($prod/discount))
   then xs:decimal($prod/price) - xs:decimal($prod/discount)
   else xs:decimal($prod/price)
};
declare function local:price-based-on-percent
 ($prod as element(prod)) as xs:decimal {
   xs:decimal($prod/price) * .90
};

Other algorithms could be isolated as well, such as the algorithm for deciding on the right price list, or choosing which CSS to use to format the report. Each customer might have a configuration profile that stores their discount policy and their user interface preferences as function items that can be passed to the local:price-list function. Although this is a simple example, it is not hard to imagine a more complex scenario that requires maximum flexibility when providing custom data and user interfaces to a variety of users.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.254.231