Chapter 16. XSLT 2.0 and XPath 2.0

Although XSLT 2.0 and XPath 2.0 are still working drafts at the time of this writing, they are nearing completion, and there are some partial implementations available for these specs, such as Saxon 7.7 (check http://saxon.sourceforge.net for the latest version). This chapter attempts to summarize some of the more interesting features in these specifications, and demonstrates a few of them, too. But it won’t be an exhaustive review of XSLT 2.0 or XPath 2.0, partly because these specs are still changing, and partly because an exhaustive review would take up a whole book by itself.

Tip

The material in this chapter is based on the May 2003 working drafts of XSLT 2.0 and XPath 2.0, so it is possible that things will change in those drafts by the time you read this.

First of all, I’ll highlight some of the changes that have been made since XSLT 1.0 and XPath 1.0, and I’ll also mention a few of the features that have been added. Then I’ll show you how you can put some of this new stuff to work today.

Rather than just two specifications, as is the case with XSLT 1.0 and XPath 1.0, the next versions of these specs are broken into five documents. Three new documents have been broken out for those features of XSLT and XPath that also support the XML Query Language (see http://www.w3.org/TR/xquery/).

XSL Transformations (XSLT) Version 2.0 (see http://www.w3.org/TR/xslt20/)

This evolution of the XSLT 1.0 specification is about twice as long as its predecessor. Although it’s lengthy, I think this spec is clearer than 1.0, and it even sports a glossary.

XML Path Language (XPath) 2.0 (see http://www.w3.org/TR/xpath20/)

XPath has also evolved; the data model and functions are now documented in separate specifications.

XQuery 1.0 and XPath 2.0 Data Model (see http://www.w3.org/TR/xpath-datamodel/)

XPath has an upgraded data model that applies to XQuery as well. The terminology used to describe the data model has been changed and refined, so although the data model for XSLT 2.0 is technically very similar to XPath 1.0, it is now described in more formal language.

XQuery 1.0 and XPath 2.0 Functions and Operators (see http://www.w3.org/TR/xpath-functions/)

Many functions that also support XQuery have been added to XPath. The function library has tripled in size, from under 30 functions in 1.0 to over 100 in 2.0 (counting functions in all signatures).

XSLT 2.0 and XQuery 1.0 Serialization (see http://www.w3.org/TR/xslt-xquery-serialization/)

This description of how result trees are serialized, which was previously an integral part of the XSLT spec, has been pulled out into a separate document so that it can be used in non-XSLT environments such as XQuery.

New XSLT 2.0 Features

Listed below are some of the new features added to the XSLT 2.0 specification:

Terminology changes

XSLT makes a number of refinements to terminology, and a glossary is now available at the end of the specification. For example, the term result tree fragment has been replaced by the term temporary tree . A temporary tree is natively a sequence of nodes, obviating the need for an extension function for node-sets to cast a result tree fragment to a node-set. Another example: a template is now known as a sequence constructor . A sequence can contain nodes or atomic values.

XHTML output

In addition to xml, html, and text, XSLT 2.0 adds the xhtml output method (see Section 20 of XSLT 2.0 and Section 5 of the serialization specification).

Multiple result trees

One of the most welcome new features in XSLT 2.0 is the ability to produce multiple result trees, rather than just one. This is accomplished through the result-document element. This element is similar to the saxon:output element you saw in the last chapter, though it has somewhat different attributes. You will see an example of this in Section 16.3, later in this chapter. Also see Section 19.1 of XSLT 2.0.

Regular expressions

A regular expression describes text with a pattern made up of characters that have special meaning within the expression. The analyze-string element, together with matching-substring and non-matching-substring child elements, allows you to analyze a string using a regular expression. The XPath 2.0 functions matches( ), replace( ), and tokenize( ) also make use of regular expressions. See “Using Regular Expressions” later in this chapter for an example. See also Section 15 of XSLT 2.0.

Validation support for XML Schema

A schema-aware XSLT processor supports validation using W3C XML Schema. This support is not required, however. There is also a conformance level for a basic XSLT processor that does not support validation. See Section 21 of XSLT 2.0. XML Schema support, in fact, goes well beyond just validation (in the sense of rejecting invalid documents). Once a source document has been processed by a schema, you can use information about the types of different nodes. For example, you could write a template rule that processes any attribute of type date.

Date format

Just as numbers could be formatted with the format-number( ) function and the decimal-format element in XSLT 1.0, a date may be formatted with the format-date( ) function used with the date-format element. See Section 16.5 of XSLT 2.0.

Character maps

A new character map declaration using the character-map element enables a stylesheet to support sets of characters for output. Similarly, the output-character element maps a single character to a string for output. This functionality is an improvement over the disable-output-escaping attribute functionality in XSLT 1.0. See Section 20.1 of XSLT 2.0.

Grouping

Using the new for-each-group element, XSLT 2.0 now offers a built-in grouping feature, rather than depending on common yet nonstandard approaches used in XSLT 1.0. See “Grouping in XSLT 2.0” in this chapter as well as Section 14 of XSLT 2.0.

Parameters in new places

You can pass a parameter to the template rule having the highest import precedence using with-param as a child of the apply-imports element. You can also pass parameters using the next-match element, which matches other template rules beside the current one (that also happens to have the highest priority). See Section 6.7 of XSLT 2.0.

New elements

Besides those already mentioned, XSLT 2.0 adds a half dozen other elements:

function element

Defines a stylesheet function. Stylesheet functions are similar to named templates, except that rather than invoking them using a call-template instruction, you can invoke them using a function call anywhere in an XPath expression. This makes them more versatile than templates—for example, you can write a function to compute a sort key.

import-schema element

Imports an XML Schema for validation by a schema-aware XSLT processor.

namespace element

Creates a namespace node. This is useful (in rare cases) when you need to decide at runtime which namespaces to include in the result tree.

next-match element

Overrides a template rule with another rule of lower priority or precedence; works with the current or imported stylesheets.

sequence element

Constructs a sequence of nodes or atomic values.

sort-key element

Declares a named sort key; holds one or more sort elements.

New attributes on existing elements

A number of new attributes appear on elements that have existed since XSLT 1.0 and are listed here:

as attribute

Added to key, param, template, and variable, this attribute specifies the required type for the result.

collation attribute

Identifies a named collation for ordering strings; this attribute has now been added to the key and sort elements.

copy-namespaces attribute

Available on the copy and copy-of elements with a value of yes or no. The default is yes.

disable-output-escaping attribute

Now appears on attribute; it appeared only on text and value-of in XSLT 1.0.

type attribute

Appears on attribute, copy, copy-of, and element in order to associate with the item type from a schema.

undeclare-namespaces attribute

Appears on output to specify whether to undeclare namespaces in the output. This feature anticipates support for XML Namespaces 1.1, which allows namespaces to be undeclared.

validation attribute

Appears on attribute, copy, copy-of, and element, with one of four possible values: lax, preserve, strict, or strip. This is closely associated with the type attribute.

New attributes on output

A number of new attributes also have been added to the output element:

escape-uri-attributes attribute

Specifies whether a processor escapes URIs in HTML and XHTML; value must be yes or no.

include-content-type attribute

Specifies whether to add a meta element in HTML and XHTML output; value must be either yes or no.

name attribute

An output declaration may now be labeled with a name attribute. This is used in conjunction with result-document which allows multiple result trees; these can either all use the same output format or use a variety of different output formats.

normalize-unicode attribute

Indicates whether, yes or no, the Unicode output should use Normalization Form C (see http://www.unicode.org/unicode/reports/tr15/).

use-character-maps attribute

Identifies a named character map defined by the character-map element.

That’s just a few of the new features in XSLT 2.0; next, I’ll discuss some of the new ones found in XPath 2.0.

New XPath 2.0 Features

Following are just a handful of some of the new features added to the XPath 2.0 specification:

Improved terminology

XPath has tightened up its terminology, and a glossary will be available at the end of the specification in later drafts. For example, the result of an expression is now considered a sequence of zero or more items, and an item is either a node or an atomic value, such as an integer, as defined by XML Schema datatypes (see http://www.w3.org/TR/xmlschema-2/). This is much more than a terminology change. You can now have sequences of integers or strings (there are many more datatypes) as well as sequences of nodes.

New functions

XPath 2.0 has over 100 functions, compared with 27 in XPath 1.0 (I’m counting functions with the same name but different signatures or argument lists as one function). They are too numerous to list in this book, but you can peruse them in the new functions and operators specification (see http://www.w3.org/TR/xpath-functions/).

Strongly typed

XPath 2.0 has grown into a strongly typed language. It recognizes datatypes from XML Schema and also its own datatypes, such as xdt:anyAtomicType. See Section 2.4 of XPath 2.0.

New kind tests

New kind tests are now offered that test kinds of nodes, such as document-node( ), element( ), and attribute( ); for example, document-node( ) matches the document node (root node in XSLT 1.0). You can also test with empty( ) and item( ). The occurrence indicators ? (zero or one), * (zero or more), and + (one or more) are also in the mix; for example, item( )* matches zero or more atomic values or nodes. See 2.4 in XPath 2.0.

Sequences and ranges

Sequence expressions allow you to specify a sequence of items that can be atomic values or nodes; for example, (100, 101, 102) will return a sequence of the atomic values 100, 101, and 102, in that order. Range expressions let you represent a range of items; for example, (100 to 110) is a range from 100 to 110. See Section 3.3 of XPath 2.0. You can also combine sequences of nodes with the union, intersect, and except operators. See Section 3.3.2 of XPath 2.0.

Comparison

XPath 2.0 adds new comparison operators, such as eq, ne, lt, le, gt, and ge, but you can still use =, !=, < as &lt;, <= as &lt;=, >, and >=. The node comparison operators is and isnot have also been added as well. Also new are << and >>, which test the order of nodes. The new operators are stricter about the type conversions they allow, and they should be faster and safer as a result. Strong typing means your errors are more likely to be reported at compile time rather than simply give you the wrong output. See Section 3.5 of XPath 2.0.

For and conditional expressions

For expressions make it possible to process a range of values in one step. For example, sum(for $i in //item return $i/price * $i/quantity) computes the sum of the value of price times quantity over all items. See Section 3.7 in XPath 2.0. Also, you can now use a construct such as if (value[1] gt value[2]) then value[1] else value[2] in expressions. See Section 3.8 in XPath 2.0.

Quantified expressions

XPath 2.0 has new keywords such as some, every, and satisfies, which allow you to test for partial or complete compliance with a given item; for example, if (every $i in //item satisfies $i < 1000) then.... See Section 3.9 of XPath 2.0.

Working with types

You can now test whether an item is an instance of a type; you can cast as a type (change the type) and check whether an item is castable (its type can change); for example, if ($x castable as xs:date) tests whether the string in $x is a valid date; you can also treat as a type (meaning temporarily treat a type as another type).

This is by no means a complete review of all the changes and additions to XSLT 2.0 or XPath 2.0—it’s just a quick discussion of a good number of them. These are working drafts; it is possible that they will change somewhat before they become recommendations. Fortunately, you can start playing with some of the new features today by using Saxon 7.7 (or later), which is an experimental implementation of XSLT 2.0 and XPath 2.0. The remaining sections of this chapter will try out some of these features, the first of which is result-document element.

Multiple Result Trees

In the last chapter, you used the saxon:output extension element to create more than one result tree from a single stylesheet. XSLT 2.0 has integrated this functionality into the mainstream of the specification with the result-document element. The following example shows you how to use this element to produce three result trees from one source tree.

Example 16-1, the document functions.xml in examples/ch16, describes the new context-related functions from XPath 2.0.

Example 16-1. A document describing XPath 2.0 functions
<?xml version="1.0"?>
   
<functions type="context">
 <function>
  <name>fn:context-item(  )</name>
 <description>Returns the context item.</description>
 </function>
 <function>
  <name>fn:position(  )</name>
  <description>Returns the position of the context item within the sequence of items 
currently being processed.</description>
 </function>
 <function>
  <name>fn:last(  )</name>
  <description>Returns the number of items in the sequence of items currently being processed.</description>
 </function>
 <function>
  <name>fn:current-dateTime(  )</name>
  <description>Returns the current xs:dateTime.</description>
 </function>
 <function>
  <name>fn:current-date(  )</name>
  <description>Returns the current xs:date.</description>
 </function>
 <function>
  <name>fn:current-time(  )</name>
  <description>Returns the current xs:time.</description>
 </function>
 <function>
  <name>fn:default-collation(  )</name>
  <description>Returns the value of the default collation property from the static context.</description>
 </function>
 <function>
  <name>fn:implicit-timezone(  )</name>
  <description>Returns the value of the implicit timezone property from the evaluation context.</description>
 </function>
</functions>

The descriptions of the functions are from the specification. The fn:position( ) and fn:last( ) functions are the same as the position( ) and last( ) functions from XPath 1.0. The fn:context-item( ) function is similar to the current( ) function available from XSLT 1.0 and XSLT 2.0. Usually, a context item is the same as the current item, except when a predicate is involved.

Tip

You don’t need to worry about the namespace prefix fn: for functions, because you won’t need to use it in XSLT. It’s there because XPath can be used from other environments besides XSLT, and some may use different function libraries, so it’s useful to use namespaces to distinguish the functions as being from different libraries.

Example 16-2, the context.xsl stylesheet, produces four result trees based on functions.xml. The default result tree is text, and the three others are for XML, HTML, and XHTML output, respectively.

Example 16-2. An XSLT 2.0 stylesheet that produces four kinds of output
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:output name="xml" method="xml" indent="yes"/>
<xsl:output name="html" method="html" indent="yes"/>
<xsl:output name="xhtml" method="html" indent="yes"/>
<xsl:param name="dir">file:///C:/LearningXSLT/examples/ch16</xsl:param>
   
<xsl:template match="functions">
 <xsl:text>XPath 2.0 Context Functions&#10;</xsl:text>
 <xsl:text>Date: </xsl:text>
 <xsl:value-of select="current-date(  )"/>
 <xsl:text>&#10;</xsl:text>
 <xsl:apply-templates select="function" mode="text"/>
 <xsl:result-document format="xml" href="{$dir}/context.xml">
 <xsl:message terminate="no">Printing text result tree...</xsl:message>
  <list>
   <description>XPath 2.0 Context Functions</description>
   <date><xsl:value-of select="current-date(  )"/></date>
    <xsl:message terminate="no">Printing XML result tree in functions.xml...</xsl:message>
    <xsl:apply-templates select="function" mode="xml"/>
  </list>
 </xsl:result-document>
 <xsl:result-document format="html" href="{$dir}/context.html">
  <xsl:message terminate="no">Printing HTML result tree in functions.html...</xsl:message>
  <html>
  <body>
  <h2>XPath 2.0 Context Functions</h2>
  <h3>Date: <xsl:value-of select="current-date(  )"/></h3>
  <ul>
   <xsl:apply-templates select="function" mode="html"/>
  </ul>
  </body>
  </html>
 </xsl:result-document>
 <xsl:result-document format="xhtml" href="{$dir}/context-x.html">
  <xsl:message terminate="no">Printing XHTML result tree in functions-x.html...</xsl:message>
  <html xmlns="http://www.w3.org/1999/xhtml">
  <body>
  <h2>XPath 2.0 Context Functions</h2>
  <h3>Date: <xsl:value-of select="current-date(  )"/></h3>
  <ol>
   <xsl:apply-templates select="function" mode="xhtml"/>
  </ol>
  </body>
  </html>
 </xsl:result-document>
</xsl:template>
   
<xsl:template match="function" mode="text">
 <xsl:text> - </xsl:text>
 <xsl:value-of select="name"/>
 <xsl:text>&#10;</xsl:text>
</xsl:template>
   
<xsl:template match="function" mode="xml">
 <function><xsl:value-of select="name"/></function>
</xsl:template>
   
<xsl:template match="function" mode="html">
 <li><xsl:value-of select="name"/></li>
</xsl:template>
   
<xsl:template match="function" mode="xhtml">
 <li xmlns="http://www.w3.org/1999/xhtml"><xsl:value-of select="name"/></li>
</xsl:template>
   
</xsl:stylesheet>

The version attribute on stylesheet shows the 2.0 version number. There are four output elements, three of which are named. This allows a result-document element to reference an output element by name, hence to use the information in it. A global parameter named dir holds the name of the directory where three of the result trees are written as files. This information is referenced by the attribute value template {$dir} in the href attributes on the result-document elements. You could pass in a new value for the dir parameter if you want to change the destination of the output.

The template matching functions creates a text result tree, plus three other result trees inside result-document elements. Each result tree issues its own message using the message element. Each result tree also applies templates to a template matching function, though each in a different mode (text, xml, html, and xhtml). The different modes for each result help create an appropriate tree for each of the given formats. The new current-date( ) function is called in each result tree, too.

To get this to work, you need to use a full Java version of Saxon, preferably Version 7.7 or later, available from http://saxon.sourceforge.net or in the examples/ch16 directory as saxon7-7.zip (the JAR file saxon7.jar has already been extracted from saxon7-7.zip). For specific instructions on how to download, install, and use Saxon with the Java interpreter, see the appendix.

Once everything is installed and working, you can type this command:

java -jar saxon7.jar functions.xml context.xsl

and you will get the following text result tree, plus messages about the other three:

Printing text result tree...
Printing XML result tree in context.xml...
Printing HTML result tree in context.html...
Printing XHTML result tree in context-x.html...
XPath 2.0 Context Functions
Date: 2003-08-26
 - fn:context-item(  )
 - fn:position(  )
 - fn:last(  )
 - fn:current-dateTime(  )
 - fn:current-date(  )
 - fn:current-time(  )
 - fn:default-collation(  )
 - fn:implicit-timezone(  )

The files that the three result-document elements produced contain the other result trees. The first one is context.xml:

<?xml version="1.0" encoding="UTF-8"?>
<list>
   <description>XPath 2.0 Context Functions</description>
   <date>2003-10-03</date>
   <function>fn:context-item(  )</function>
   <function>fn:position(  )</function>
   <function>fn:last(  )</function>
   <function>fn:current-dateTime(  )</function>
   <function>fn:current-date(  )</function>
   <function>fn:current-time(  )</function>
   <function>fn:default-collation(  )</function>
   <function>fn:implicit-timezone(  )</function>
</list>

The second is context.html, an HTML document that uses an unordered (bulleted) list:

<html>
   <body>
      <h2>XPath 2.0 Context Functions</h2>
      <h3>Date: 2003-10-03</h3>
      <ul>
         <li>fn:context-item(  )</li>
         <li>fn:position(  )</li>
         <li>fn:last(  )</li>
         <li>fn:current-dateTime(  )</li>
         <li>fn:current-date(  )</li>
         <li>fn:current-time(  )</li>
         <li>fn:default-collation(  )</li>
         <li>fn:implicit-timezone(  )</li>
      </ul>
   </body>
</html>

And the third is context-x.html, an XHTML document that uses an ordered (numbered) list:

<html xmlns="http://www.w3.org/1999/xhtml">
   <body>
      <h2>XPath 2.0 Context Functions</h2>
      <h3>Date: 2003-10-03</h3>
      <ol>
         <li>fn:context-item(  )</li>
         <li>fn:position(  )</li>
         <li>fn:last(  )</li>
         <li>fn:current-dateTime(  )</li>
         <li>fn:current-date(  )</li>
         <li>fn:current-time(  )</li>
         <li>fn:default-collation(  )</li>
         <li>fn:implicit-timezone(  )</li>
      </ol>
   </body>
</html>

As you can see, result-document provides a great convenience creating more than one result tree from just one stylesheet. Next is an example that uses regular expressions.

Using Regular Expressions

Regular expressions allow you to define specific patterns for searching strings of text. XML Schema supports regular expressions, and XSLT 2.0 relies on XML Schema-style regular expressions. Table 16-1 shows a sampling of symbols used in regular expressions that XSLT 2.0 supports. The table represents only a few of the possibilities.

Table 16-1. Sample of regular expression symbols

Regular Expression

Description

.

Matches any character except a newline or carriage return.

*

Matches any character.

?

Matches any single character.

s

Matches any whitespace character, including a space, tab, newline, or carriage return.

S or [^s] or[^#x20 ]

Matches any character except a whitespace character.

d or [0-9]

Matches any digit.

d{3}

Matches any three digits.

D or [^d] or[^0-9]

Matches any character except a digit.

^

Matches the beginning of a line.

$

Matches the end of a line.

Ll{5}

Matches any five lowercase letters.

Lu{6}

Matches any six uppercase letters.

P{1}

Matches any single punctuation character.

In regular expressions, you can mix these symbols with actual characters to form a search string. For example, using these symbols, you could match:

  • A U.S.-style 9-digit ZIP code, such as 10048-1000 with d{5}-d{4}

  • A U.S.-style 10-digit phone number, such as (800)555-1234 with (d{3})d{3}-d{4}

  • The word The at the beginning of a line, followed by a whitespace character, followed by any character, with the expression ^Thes*

XPath 2.0 adds three new functions for use with regular expressions: matches( ), replace( ) , and tokenize( ) . For more information on these new functions, see Section 7.5 of the functions and operators specification for XPath 2.0 and XQuery 1.0 at http://www.w3.org/TR/xpath-functions/. XSLT 2.0 offers the new analyze-string element. See Section 15 of the XSLT 2.0 spec at http://www.w3.org/TR/xslt20/ for more information on that. I’ll show you examples of the matches( ) and replace( ) functions, and the analyze-string element.

Tip

The tokenize( ) function is not demonstrated in this chapter. It breaks a string into tokens. The tokens are separated by a regular expression such as by one or more spaces (s+).

The matches( ) Function

The function matches( ) is new in XPath 2.0. This function returns an xs:boolean value that indicates whether the value in the first argument matches the regular expression in the value of the second argument. The stylesheet match.xsl , in Example 16-3, uses the matches( ) function to test whether a string matches a regular expression.

Example 16-3. A stylesheet matching on a regular expression
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
   
<xsl:template match="functions">
 <xsl:element name="list">
  <xsl:element name="description">XPath 2.0 Context Functions</xsl:element>
  <xsl:element name="date">
   <xsl:value-of select="current-date(  )"/>
  </xsl:element>
   <xsl:apply-templates select="function"/>
 </xsl:element>
</xsl:template>
   
<xsl:template match="function">
 <xsl:copy>
  <xsl:if test="matches(name,'^fn:')">
                  <xsl:value-of select="substring(name, 4)"/>
                  </xsl:if>
 </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

The first template rule uses a new XPath 2.0 function, current-date( ), to insert the current date into a date element in the result tree, then it applies templates for function elements. In the second template rule, the first argument of matches( ) is name—a child node of function. The content of name is the string that this function attempts to match. The second argument is a regular expression. ^fn: looks for the letters fn: at the beginning of the line (^). If matches( ) finds ^fn: and returns true, the value-of element in the template of if writes a substring from the content of name beginning from the fourth character, thus eliminating fn:.

Transform functions.xml with match.xsl with:

java -jar saxon7.jar functions.xml match.xsl

and you will see this result:

<?xml version="1.0" encoding="UTF-8"?>
<list>
   <description>XPath 2.0 Context Functions</description>
   <date>2003-10-03</date>
   <function>context-item(  )</function>
   <function>position(  )</function>
   <function>last(  )</function>
   <function>current-dateTime(  )</function>
   <function>current-date(  )</function>
   <function>current-time(  )</function>
   <function>default-collation(  )</function>
   <function>implicit-timezone(  )</function>
</list>

The replace( ) Function

The new replace( ) function in XPath 2.0 returns the value of the first argument with every substring matched by the regular expression in the second argument, replaced by the string in the third argument. Example 16-4, the stylesheet replace.xsl , will show you how it works.

Example 16-4. A stylesheet replacing regular expressions
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
   
<xsl:template match="functions">
 <xsl:element name="list">
  <xsl:element name="description">XPath 2.0 Context Functions</xsl:element>
  <xsl:element name="date">
   <xsl:value-of select="current-date(  )"/>
  </xsl:element>
   <xsl:apply-templates select="function"/>
 </xsl:element>
</xsl:template>
   
<xsl:template match="function">
 <xsl:copy>
  <xsl:value-of select="replace(name, '^fn:', '')"/>
 </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

The first argument of replace( ) is the name element, meaning the content of the name element. The second argument is the regular expression you are looking for, and the third argument is the string you want to replace the second argument with. If you process functions.xml with:

java -jar saxon7.jar functions.xml replace.xsl

it will produce the same output as match.xsl.

The analyze-string Element

Finally, the instruction element analyze-string is also new in XSLT 2.0. This element allows you to select a string using the select attribute, and then search that string with a regular expression defined in a regex attribute. Two children can then follow analyze-string: matching-substring to define what happens when analyze-string finds a matching substring, and can follow non-matching-substring to define what happens when analyze-string finds a non-matching substring. You can use either matching-substring or non-matching-substring or both. (Also, analyze-string accepts fallback as a child.)

The regex.xsl stylesheet, Example 16-5, uses analyze-string to handle some text in a node.

Example 16-5. A stylesheet performing more complex regular expressions processing
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
   
<xsl:template match="functions">
 <xsl:element name="list">
  <xsl:element name="description">XPath 2.0 Context Functions</xsl:element>
  <xsl:element name="date">
   <xsl:value-of select="current-date(  )"/>
  </xsl:element>
   <xsl:apply-templates select="function"/>
 </xsl:element>
</xsl:template>
   
<xsl:template match="function">
 <xsl:copy>
 <xsl:analyze-string select="name" regex="^fn:">
                  <xsl:matching-substring></xsl:matching-substring>
                  <xsl:non-matching-substring>
                  <xsl:value-of select="."/>
                  </xsl:non-matching-substring>
                  </xsl:analyze-string>
 </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

The second template searches the content of function elements in the source tree. When analyze-string finds the string fn: at the beginning of a line, it replaces the matching substring with nothing in the result tree and outputs the matching substring as is using value-of.

Execute the transformation with this command:

java -jar saxon7.jar functions.xml regex.xsl

and you will get the following result:

<?xml version="1.0" encoding="UTF-8"?>
<list>
   <description>XPath 2.0 Context Functions</description>
   <date>2003-08-26</date>
   <function>context-item(  )</function>
   <function>position(  )</function>
   <function>last(  )</function>
   <function>current-dateTime(  )</function>
   <function>current-date(  )</function>
   <function>current-time(  )</function>
   <function>default-collation(  )</function>
   <function>implicit-timezone(  )</function>
</list>

Tip

This same effect can be achieved by using replace( ) or even matches( ), as you saw earlier. The main reason for using analyze-string is when the replacement text contains elements—for example, you could use analyze-string to replace a line break by a br tag.

These examples give you a taste of what is possible using regular expressions. For more information on the regular expressions used by XML Schema, and XSLT 2.0 by association, see http://www.w3.org/TR/xmlschema-0.html#regexAppendix and http://www.w3.org/TR/xmlschema-2.html#regexs.

Grouping in XSLT 2.0

Grouping in XSLT is the process by which you can group nodes based on a given criterion. In XSLT 1.0, the process is a little complicated and requires somewhat elaborate expressions, often employing the preceding-sibling axes to check whether a node belongs to a group. You could also group nodes with a key using the Muenchian method, which was demonstrated in Chapter 11. You can also read about how to do XSLT 1.0 grouping in Chapter 6 of Doug Tidwell’s XSLT (O’Reilly) or in Chapter 9 of Michael Kay’s XSLT Programmer’s Reference, Second Edition (Wrox). I prefer grouping in XSLT 2.0 because it is much simpler and easier to explain, the ease of which probably grew out of my experience with grouping in Version 1.0

Grouping in XSLT 1.0 usually brings the for-each instruction element into service. XSLT 2.0 has a new instruction element called for-each-group that makes grouping a relative snap. I’ll show you how in the following example.

Glance at group2.xml , in Example 16-6, which lumps the XPath 2.0’s context-related functions into two piles by labeling them with a type attribute.

Example 16-6. A list of XPath 2.0 context-related functions
<?xml version="1.0"?>
   
<list>
 <description>XPath 2.0 Context Functions</description>
 <date>2003-10-03</date>
 <function type="new">context-item(  )</function>
 <function type="new">current-date(  )</function>
 <function type="new">current-dateTime(  )</function>
 <function type="new">current-time(  )</function>
 <function type="new">default-collation(  )</function>
 <function type="new">implicit-timezone(  )</function>
 <function type="legacy">last(  )</function>
 <function type="legacy">position(  )</function>
</list>

The eight functions in this list are either legacy or new functions. The group2.xsl stylesheet, in Example 16-7, groups the functions in group2.xml according to the content of the type attribute.

Example 16-7. A stylesheet grouping elements using XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
   
<xsl:template match="list">
<xsl:copy>
 <xsl:for-each-group select="function" group-by="@type">
  <functions type="{@type}">
   <xsl:value-of select="current-group(  )" separator=", "/>
  </functions>
 </xsl:for-each-group>
</xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

The for-each-group function selects the node-set to group with the select attribute—all function children of list, that is. The group-by attribute determines the key for grouping, which, in this case, is the content of the type attribute in the source. The functions literal result element uses an attribute value template to reflect the value of the type attribute.

The value-of element’s select attribute uses the current-group( ) function—also a new kid on the block in XSLT 2.0—to keep track of which group is which. The separator attribute is also a new addition to XSLT 2.0. It tells the XSLT 2.0 processor to write a comma followed by a space after each found node is sent to the result tree.

Tip

In XSLT 1.0, value-of outputs only the first node of a returned node-set in string form; in XSLT 2.0, all nodes can be returned, so you have to plan accordingly.

You might guess correctly that for-each-group has several other attributes, which it does, namely, group-adjacent, group-starting-with, group-ending-with, and collation. I’m not going to cover them here, but you can read more about for-each-group and its attributes in Section 14 of the XSLT 2.0 specification.

Use this command to transform group.xml:

java -jar saxon7.jar group2.xml group2.xsl

The result is two lists of functions, grouped and comma-separated, in functions elements:

<?xml version="1.0" encoding="UTF-8"?>
<list>
   <functions type="new">context-item(  ), current-date(  ), current-dateTime(  ), 
current-time(  ), default-collation(  ), implicit-timezone(  )</functions>
   <functions type="legacy">last(  ), position(  )</functions>
</list>

This example should give you a feel of how to group nodes in XSLT 2.0. In the example that follows, you will learn how to use the new top-level function element.

Extension Functions

You learned about external extension functions in the last chapter. You can now add extension functions on the stylesheet level in XSLT 2.0 using the function element. These are called stylesheet functions, but they work like any extension function in an expression. The difference is that they are completely portable between one XSLT 2.0 processor and another.

Example 16-8, function.xsl , uses function to declare a stylesheet function.

Example 16-8. Creating extension functions in XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns:wy="http://www.wyeast.net/functions">
<xsl:output method="text"/>
<xsl:function name="wy:kilometers">
               <xsl:param name="miles" as="xs:decimal"/>
               <xsl:sequence select="$miles * 1.609347"/>
               </xsl:function>
   
<xsl:template match="/">
 <xsl:apply-templates select="trip"/>
</xsl:template>
   
<xsl:template match="trip">
 <xsl:apply-templates select="distance"/>
</xsl:template>
   
<xsl:template match="distance">
 <xsl:text>The distance from </xsl:text>
 <xsl:value-of select="location"/>
 <xsl:text> to </xsl:text>
 <xsl:value-of select="destination"/>
 <xsl:text> is </xsl:text>
 <xsl:value-of select="round(wy:kilometers(miles))"/>
 <xsl:text> kilometers.&#10;</xsl:text>
</xsl:template>
   
</xsl:stylesheet>

When I tested this, it appeared that stylesheet functions must have at least one argument, but this may not be the case, given that 2.0 is still in the early stages. Stylesheet functions must also be identified with a QName that uses a prefix (this is to ensure that user-defined functions don’t clash with system-defined functions). The namespace URI and prefix associated with the QName in this example is http://www.wyeast.net/functions and wy:, respectively. It’s declared on the stylesheet element.

The function element must be on the top level and declares the stylesheet function named wy:kilometers( ). The function performs a simple conversion of miles to kilometers by accepting a single parameter, miles. Parameters for stylesheet functions are defined with param elements but cannot have default values. The new as attribute on param declares the value of miles as an xs:decimal value, according to the boundaries set by XML Schema datatypes (the namespace is declared on the document element).

The new XSLT 2.0 sequence element adds a sequence of nodes or atomic values to the result tree. In this case, it returns a product (a single atomic value) and works much like value-of. In other situations, you can add existing nodes to a sequence with this element, not just new ones. The factor for converting miles to kilometers (1.609347) comes from the National Institute of Standards and Technology (NIST), and is based on the U.S. survey foot (see http://physics.nist.gov/Pubs/SP811/appenB8.html).

The wy:kilometers( ) function is called later in the stylesheet in a value-of element. It takes a miles node as an argument, and its return value is rounded up or down with the round( ) function. The result is output as text, embedded in a sentence formed from the nodes in the source tree.

Soon, you’ll apply this stylesheet to trip.xml, shown in Example 16-9, which holds the road mileage between several U.S. cities.

Example 16-9. Mileage between selected U.S. cities
<?xml version="1.0"?>

<trip>
 <distance>
  <location>Tucson</location>
  <destination>Flagstaff</destination>
  <miles>253</miles>
 </distance>
 <distance>
  <location>Portland</location>
  <destination>Medford</destination>
  <miles>272</miles>
 </distance>
 <distance>
  <location>Denver</location>
  <destination>Colorado Springs</destination>
  <miles>67</miles>
 </distance>
</trip>

Perform the transformation with:

java -jar saxon7.jar trip.xml function.xsl

You will see this outcome on your screen:

The distance from Tucson to Flagstaff is 407 kilometers.
The distance from Portland to Medford is 438 kilometers.
The distance from Denver to Colorado Springs is 108 kilometers.

The wy:kilometers( ) stylesheet function may be reused as often as you need it in this stylesheet. A stylesheet function can also be included or imported from another stylesheet.

Summary

XSLT 2.0 and XPath 2.0 offer an almost overwhelming number of new features. Some have complained about the new versions of XSLT and XPath on this count. Personally, I like most of the new offerings and, fortunately, no one is forced to adopt all the new functionality. Nevertheless, the terminology will definitely require devotees to plow deeply into the new specifications in order to get a grip on it.

This chapter lightly introduced you to many highlights from these new technologies. It also walked you through how to output multiple result documents, define and use regular expressions, use grouping, and create stylesheet functions.

The next chapter shows programmers how to use APIs to write your own interface to an XSLT processor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.158.165