Although XSLT 2.0 and XPath 2.0 are still working drafts at the time of this writing, they are nearing completion, and there are some partial implementations available for these specs, such as Saxon 7.7 (check http://saxon.sourceforge.net for the latest version). This chapter attempts to summarize some of the more interesting features in these specifications, and demonstrates a few of them, too. But it won’t be an exhaustive review of XSLT 2.0 or XPath 2.0, partly because these specs are still changing, and partly because an exhaustive review would take up a whole book by itself.
The material in this chapter is based on the May 2003 working drafts of XSLT 2.0 and XPath 2.0, so it is possible that things will change in those drafts by the time you read this.
First of all, I’ll highlight some of the changes that have been made since XSLT 1.0 and XPath 1.0, and I’ll also mention a few of the features that have been added. Then I’ll show you how you can put some of this new stuff to work today.
Rather than just two specifications, as is the case with XSLT 1.0 and XPath 1.0, the next versions of these specs are broken into five documents. Three new documents have been broken out for those features of XSLT and XPath that also support the XML Query Language (see http://www.w3.org/TR/xquery/).
This evolution of the XSLT 1.0 specification is about twice as long as its predecessor. Although it’s lengthy, I think this spec is clearer than 1.0, and it even sports a glossary.
XPath has also evolved; the data model and functions are now documented in separate specifications.
XPath has an upgraded data model that applies to XQuery as well. The terminology used to describe the data model has been changed and refined, so although the data model for XSLT 2.0 is technically very similar to XPath 1.0, it is now described in more formal language.
Many functions that also support XQuery have been added to XPath. The function library has tripled in size, from under 30 functions in 1.0 to over 100 in 2.0 (counting functions in all signatures).
This description of how result trees are serialized, which was previously an integral part of the XSLT spec, has been pulled out into a separate document so that it can be used in non-XSLT environments such as XQuery.
Listed below are some of the new features added to the XSLT 2.0 specification:
XSLT makes a number of refinements to terminology, and a glossary is now available at the end of the specification. For example, the term result tree fragment has been replaced by the term temporary tree . A temporary tree is natively a sequence of nodes, obviating the need for an extension function for node-sets to cast a result tree fragment to a node-set. Another example: a template is now known as a sequence constructor . A sequence can contain nodes or atomic values.
In addition to xml
,
html
, and text
, XSLT 2.0 adds
the xhtml
output method (see Section 20 of XSLT
2.0 and Section 5 of the serialization specification).
One of the most welcome new features in XSLT 2.0 is the ability to
produce multiple result trees, rather than just
one. This is accomplished through the
result-document
element. This element is
similar to the saxon:output
element you saw in the
last chapter, though it has somewhat different attributes. You will
see an example of this in Section 16.3, later in this chapter. Also
see Section 19.1 of XSLT 2.0.
A regular expression describes text with a pattern made up of
characters that have special meaning within the expression. The
analyze-string
element, together with
matching-substring
and
non-matching-substring
child elements, allows you
to analyze a string using a regular expression. The XPath 2.0
functions matches( )
, replace(
)
, and tokenize( )
also make use of
regular expressions. See “Using Regular
Expressions” later in this chapter for an example.
See also Section 15 of XSLT 2.0.
A schema-aware XSLT processor supports validation using W3C XML Schema. This support is not required, however. There is also a conformance level for a basic XSLT processor that does not support validation. See Section 21 of XSLT 2.0. XML Schema support, in fact, goes well beyond just validation (in the sense of rejecting invalid documents). Once a source document has been processed by a schema, you can use information about the types of different nodes. For example, you could write a template rule that processes any attribute of type date.
Just as numbers could be formatted with
the format-number( )
function and the
decimal-format
element in XSLT 1.0, a date may be
formatted with the format-date( )
function used
with the date-format
element. See Section 16.5 of
XSLT 2.0.
A new character map declaration using the
character-map
element enables a stylesheet to
support sets of characters for output. Similarly, the
output-character
element maps a single character
to a string for output. This functionality is an improvement over the
disable-output-escaping
attribute functionality in
XSLT 1.0. See Section 20.1 of XSLT 2.0.
Using the new for-each-group
element, XSLT 2.0 now
offers a built-in grouping feature, rather than depending on common
yet nonstandard approaches used in XSLT 1.0. See
“Grouping in XSLT 2.0” in this
chapter as well as Section 14 of XSLT 2.0.
You can pass a parameter to the template
rule having the highest import precedence using
with-param
as a child of the
apply-imports
element. You can also pass
parameters using the next-match
element, which
matches other template rules beside the current one (that also
happens to have the highest priority). See Section 6.7 of XSLT 2.0.
Besides those already mentioned, XSLT 2.0 adds a half dozen other elements:
function
elementDefines a stylesheet function. Stylesheet functions are similar to
named templates, except that rather than invoking them using a
call-template
instruction, you can invoke them
using a function call anywhere in an XPath expression. This makes
them more versatile than templates—for example, you can write a
function to compute a sort key.
import-schema
elementImports an XML Schema for validation by a schema-aware XSLT processor.
namespace
elementCreates a namespace node. This is useful (in rare cases) when you need to decide at runtime which namespaces to include in the result tree.
next-match
elementOverrides a template rule with another rule of lower priority or precedence; works with the current or imported stylesheets.
sequence
elementConstructs a sequence of nodes or atomic values.
sort-key
elementDeclares a named sort key; holds one or more sort
elements.
A number of new attributes appear on elements that have existed since XSLT 1.0 and are listed here:
as
attributeAdded to key
, param
,
template
, and variable
, this
attribute specifies the required type for the result.
collation
attributeIdentifies a named collation for ordering strings; this attribute has
now been added to the key
and
sort
elements.
copy-namespaces
attributeAvailable on the copy
and
copy-of
elements with a value of
yes
or no
. The default is
yes
.
disable-output-escaping
attributeNow appears on attribute
; it appeared only on
text
and value-of
in XSLT 1.0.
type
attributeAppears on attribute
, copy
,
copy-of
, and element
in order
to associate with the item type from a schema.
undeclare-namespaces
attributeAppears on output
to specify whether to undeclare
namespaces in the output. This feature anticipates support for XML
Namespaces 1.1, which allows namespaces to be undeclared.
validation
attributeAppears on attribute
, copy
,
copy-of
, and element
, with one
of four possible values: lax
,
preserve
, strict
, or
strip
. This is closely associated with the
type
attribute.
A number of new
attributes also have been added to the output
element:
escape-uri-attributes
attributeSpecifies whether a processor escapes URIs in HTML and XHTML; value
must be yes
or no
.
include-content-type
attributeSpecifies whether to add a meta
element in HTML
and XHTML output; value must be either yes
or
no
.
name
attributeAn output
declaration may now be labeled with a
name
attribute. This is used in conjunction with
result-document
which allows multiple result
trees; these can either all use the same output format or use a
variety of different output formats.
normalize-unicode
attributeIndicates whether, yes
or no
,
the Unicode output should use Normalization Form C (see http://www.unicode.org/unicode/reports/tr15/).
use-character-maps
attributeIdentifies a named character map defined by the
character-map
element.
That’s just a few of the new features in XSLT 2.0; next, I’ll discuss some of the new ones found in XPath 2.0.
Following are just a handful of some of the new features added to the XPath 2.0 specification:
XPath has tightened up its terminology, and a glossary will be available at the end of the specification in later drafts. For example, the result of an expression is now considered a sequence of zero or more items, and an item is either a node or an atomic value, such as an integer, as defined by XML Schema datatypes (see http://www.w3.org/TR/xmlschema-2/). This is much more than a terminology change. You can now have sequences of integers or strings (there are many more datatypes) as well as sequences of nodes.
XPath 2.0 has over 100 functions, compared with 27 in XPath 1.0 (I’m counting functions with the same name but different signatures or argument lists as one function). They are too numerous to list in this book, but you can peruse them in the new functions and operators specification (see http://www.w3.org/TR/xpath-functions/).
XPath 2.0 has grown into a strongly
typed language. It recognizes datatypes from XML Schema and also its
own datatypes, such as xdt:anyAtomicType
. See
Section 2.4 of XPath 2.0.
New kind tests are now offered that test
kinds of nodes, such as document-node( )
,
element( )
, and attribute( )
;
for example, document-node( )
matches the document
node (root node in XSLT 1.0). You can also test with empty(
)
and item( )
. The occurrence indicators
?
(zero or one), *
(zero or
more), and +
(one or more) are also in the mix;
for example, item( )*
matches zero or more atomic
values or nodes. See 2.4 in XPath 2.0.
Sequence expressions allow you to
specify a sequence of items that can be atomic values or nodes; for
example, (100, 101, 102)
will return a sequence of
the atomic values 100
, 101
, and
102
, in that order. Range expressions let you
represent a range of items; for example, (100 to
110)
is a range from 100
to
110
. See Section 3.3 of XPath 2.0. You can also
combine sequences of nodes with the union
,
intersect
, and except
operators. See Section 3.3.2 of XPath 2.0.
XPath 2.0 adds new comparison
operators, such as eq
, ne
,
lt
, le
, gt
,
and ge
, but you can still use
=
, !=
, <
as <
, <=
as
<=
, >
, and
>=
. The node comparison operators
is
and isnot
have also been
added as well. Also new are <<
and
>>
, which test the order of nodes. The new
operators are stricter about the type conversions they allow, and
they should be faster and safer as a result. Strong typing means your
errors are more likely to be reported at compile time rather than
simply give you the wrong output. See Section 3.5 of XPath 2.0.
For expressions make it possible to
process a range of values in one step. For example,
sum(for
$i
in
//item
return
$i/price
*
$i/quantity)
computes the sum
of the value of price
times
quantity
over all item
s. See
Section 3.7 in XPath 2.0. Also, you can now use a construct such as
if
(value[1]
gt
value[2])
then
value[1]
else
value[2]
in expressions.
See Section 3.8 in XPath 2.0.
XPath 2.0 has new keywords such as some
,
every
, and satisfies
, which
allow you to test for partial or complete compliance with a given
item; for example, if
(every
$i
in
//item
satisfies
$i
<
1000)
then..
.. See Section 3.9 of XPath
2.0.
You can now test whether an item is an instance of
a type; you can cast
as a type (change the type)
and check whether an item is castable
(its type
can change); for example, if
($x
castable
as
xs:date)
tests whether the
string in $x
is a valid date; you can also
treat
as
a type (meaning
temporarily treat a type as another type).
This is by no means a complete review of all the changes and
additions to XSLT 2.0 or XPath 2.0—it’s just a
quick discussion of a good number of them. These are working drafts;
it is possible that they will change somewhat before they become
recommendations. Fortunately, you can start playing with some of the
new features today by using Saxon 7.7 (or later), which is an
experimental implementation of XSLT 2.0 and XPath 2.0. The remaining
sections of this chapter will try out some of these features, the
first of which is result-document
element.
In the last chapter, you used the
saxon:output
extension element to create more than
one result tree from a single stylesheet. XSLT 2.0 has integrated
this functionality into the mainstream of the specification with the
result-document
element. The following
example shows you how to use this element to produce three result
trees from one source tree.
Example 16-1, the document functions.xml in examples/ch16, describes the new context-related functions from XPath 2.0.
<?xml version="1.0"?> <functions type="context"> <function> <name>fn:context-item( )</name> <description>Returns the context item.</description> </function> <function> <name>fn:position( )</name> <description>Returns the position of the context item within the sequence of items currently being processed.</description> </function> <function> <name>fn:last( )</name> <description>Returns the number of items in the sequence of items currently being processed.</description> </function> <function> <name>fn:current-dateTime( )</name> <description>Returns the current xs:dateTime.</description> </function> <function> <name>fn:current-date( )</name> <description>Returns the current xs:date.</description> </function> <function> <name>fn:current-time( )</name> <description>Returns the current xs:time.</description> </function> <function> <name>fn:default-collation( )</name> <description>Returns the value of the default collation property from the static context.</description> </function> <function> <name>fn:implicit-timezone( )</name> <description>Returns the value of the implicit timezone property from the evaluation context.</description> </function> </functions>
The descriptions of the functions are from the specification. The
fn:position( )
and fn:last( )
functions are the same as the position( )
and
last( )
functions from XPath 1.0. The
fn:context-item( )
function is similar to the
current( )
function available from XSLT 1.0 and
XSLT 2.0. Usually, a context item is the same as the current item,
except when a predicate is involved.
You don’t need to worry about the namespace prefix
fn
: for functions, because you
won’t need to use it in XSLT. It’s
there because XPath can be used from other environments besides XSLT,
and some may use different function libraries, so
it’s useful to use namespaces to distinguish the
functions as being from different libraries.
Example 16-2, the context.xsl stylesheet, produces four result trees based on functions.xml. The default result tree is text, and the three others are for XML, HTML, and XHTML output, respectively.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:output name="xml" method="xml" indent="yes"/> <xsl:output name="html" method="html" indent="yes"/> <xsl:output name="xhtml" method="html" indent="yes"/> <xsl:param name="dir">file:///C:/LearningXSLT/examples/ch16</xsl:param> <xsl:template match="functions"> <xsl:text>XPath 2.0 Context Functions </xsl:text> <xsl:text>Date: </xsl:text> <xsl:value-of select="current-date( )"/> <xsl:text> </xsl:text> <xsl:apply-templates select="function" mode="text"/> <xsl:result-document format="xml" href="{$dir}/context.xml"> <xsl:message terminate="no">Printing text result tree...</xsl:message> <list> <description>XPath 2.0 Context Functions</description> <date><xsl:value-of select="current-date( )"/></date> <xsl:message terminate="no">Printing XML result tree in functions.xml...</xsl:message> <xsl:apply-templates select="function" mode="xml"/> </list> </xsl:result-document> <xsl:result-document format="html" href="{$dir}/context.html"> <xsl:message terminate="no">Printing HTML result tree in functions.html...</xsl:message> <html> <body> <h2>XPath 2.0 Context Functions</h2> <h3>Date: <xsl:value-of select="current-date( )"/></h3> <ul> <xsl:apply-templates select="function" mode="html"/> </ul> </body> </html> </xsl:result-document> <xsl:result-document format="xhtml" href="{$dir}/context-x.html"> <xsl:message terminate="no">Printing XHTML result tree in functions-x.html...</xsl:message> <html xmlns="http://www.w3.org/1999/xhtml"> <body> <h2>XPath 2.0 Context Functions</h2> <h3>Date: <xsl:value-of select="current-date( )"/></h3> <ol> <xsl:apply-templates select="function" mode="xhtml"/> </ol> </body> </html> </xsl:result-document> </xsl:template> <xsl:template match="function" mode="text"> <xsl:text> - </xsl:text> <xsl:value-of select="name"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="function" mode="xml"> <function><xsl:value-of select="name"/></function> </xsl:template> <xsl:template match="function" mode="html"> <li><xsl:value-of select="name"/></li> </xsl:template> <xsl:template match="function" mode="xhtml"> <li xmlns="http://www.w3.org/1999/xhtml"><xsl:value-of select="name"/></li> </xsl:template> </xsl:stylesheet>
The version
attribute on
stylesheet
shows the 2.0 version number. There are
four output
elements, three of which are named.
This allows a result-document
element to reference
an output
element by name, hence to use the
information in it. A global parameter named dir
holds the name of the directory where three of the result trees are
written as files. This information is referenced by the attribute
value template {$dir}
in the
href
attributes on the
result-document
elements. You could pass in a new
value for the dir
parameter if you want to change
the destination of the output.
The template matching functions
creates a text
result tree, plus three other result trees inside
result-document
elements. Each result tree issues
its own message using the message
element. Each
result tree also applies templates to a template matching
function
, though each in a different mode
(text
, xml
,
html
, and xhtml
). The different
modes for each result help create an appropriate tree for each of the
given formats. The new current-date( )
function is
called in each result tree, too.
To get this to work, you need to use a full Java version of Saxon,
preferably Version 7.7 or later, available from http://saxon.sourceforge.net or in the
examples/ch16 directory as
saxon7-7.zip
(the JAR file
saxon7.jar
has already been extracted from
saxon7-7.zip
). For specific instructions on how to
download, install, and use Saxon with the Java interpreter, see the
appendix.
Once everything is installed and working, you can type this command:
java -jar saxon7.jar functions.xml context.xsl
and you will get the following text result tree, plus messages about the other three:
Printing text result tree... Printing XML result tree in context.xml... Printing HTML result tree in context.html... Printing XHTML result tree in context-x.html... XPath 2.0 Context Functions Date: 2003-08-26 - fn:context-item( ) - fn:position( ) - fn:last( ) - fn:current-dateTime( ) - fn:current-date( ) - fn:current-time( ) - fn:default-collation( ) - fn:implicit-timezone( )
The files that the three result-document
elements
produced contain the other result trees. The first one is
context.xml:
<?xml version="1.0" encoding="UTF-8"?> <list> <description>XPath 2.0 Context Functions</description> <date>2003-10-03</date> <function>fn:context-item( )</function> <function>fn:position( )</function> <function>fn:last( )</function> <function>fn:current-dateTime( )</function> <function>fn:current-date( )</function> <function>fn:current-time( )</function> <function>fn:default-collation( )</function> <function>fn:implicit-timezone( )</function> </list>
The second is context.html, an HTML document that uses an unordered (bulleted) list:
<html> <body> <h2>XPath 2.0 Context Functions</h2> <h3>Date: 2003-10-03</h3> <ul> <li>fn:context-item( )</li> <li>fn:position( )</li> <li>fn:last( )</li> <li>fn:current-dateTime( )</li> <li>fn:current-date( )</li> <li>fn:current-time( )</li> <li>fn:default-collation( )</li> <li>fn:implicit-timezone( )</li> </ul> </body> </html>
And the third is context-x.html, an XHTML document that uses an ordered (numbered) list:
<html xmlns="http://www.w3.org/1999/xhtml"> <body> <h2>XPath 2.0 Context Functions</h2> <h3>Date: 2003-10-03</h3> <ol> <li>fn:context-item( )</li> <li>fn:position( )</li> <li>fn:last( )</li> <li>fn:current-dateTime( )</li> <li>fn:current-date( )</li> <li>fn:current-time( )</li> <li>fn:default-collation( )</li> <li>fn:implicit-timezone( )</li> </ol> </body> </html>
As you can see, result-document
provides a great
convenience creating more than one result tree from just one
stylesheet. Next is an example that uses regular
expressions.
Regular expressions allow you to define specific patterns for searching strings of text. XML Schema supports regular expressions, and XSLT 2.0 relies on XML Schema-style regular expressions. Table 16-1 shows a sampling of symbols used in regular expressions that XSLT 2.0 supports. The table represents only a few of the possibilities.
Regular Expression |
Description |
. |
Matches any character except a newline or carriage return. |
|
Matches any character. |
|
Matches any single character. |
|
Matches any whitespace character, including a space, tab, newline, or carriage return. |
|
Matches any character except a whitespace character. |
|
Matches any digit. |
|
Matches any three digits. |
|
Matches any character except a digit. |
|
Matches the beginning of a line. |
|
Matches the end of a line. |
|
Matches any five lowercase letters. |
|
Matches any six uppercase letters. |
|
Matches any single punctuation character. |
In regular expressions, you can mix these symbols with actual characters to form a search string. For example, using these symbols, you could match:
A U.S.-style 9-digit ZIP code, such as 10048-1000
with d{5}-d{4}
A U.S.-style 10-digit phone number, such as
(800)555-1234
with
(d{3})d{3}-d{4}
The word The at the beginning of a line,
followed by a whitespace character, followed by any character, with
the expression ^Thes*
XPath 2.0 adds three new functions for use with regular expressions:
matches( )
, replace(
)
, and tokenize(
)
. For more information on these new
functions, see Section 7.5 of the functions and operators
specification for XPath 2.0 and XQuery 1.0 at http://www.w3.org/TR/xpath-functions/. XSLT
2.0 offers the new
analyze-string
element. See Section 15 of
the XSLT 2.0 spec at http://www.w3.org/TR/xslt20/ for more
information on that. I’ll show you examples of the
matches( )
and replace( )
functions, and the analyze-string
element.
The tokenize( )
function is not demonstrated in
this chapter. It breaks a string into tokens. The tokens are
separated by a regular expression such as by one or more spaces
(s+
).
The function matches( )
is new in XPath 2.0. This
function returns an xs:boolean
value that
indicates whether the value in the first argument matches the regular
expression in the value of the second argument. The stylesheet
match.xsl
,
in Example 16-3, uses the matches(
)
function to test whether a string matches a regular
expression.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="functions"> <xsl:element name="list"> <xsl:element name="description">XPath 2.0 Context Functions</xsl:element> <xsl:element name="date"> <xsl:value-of select="current-date( )"/> </xsl:element> <xsl:apply-templates select="function"/> </xsl:element> </xsl:template> <xsl:template match="function"> <xsl:copy> <xsl:if test="matches(name,'^fn:')"> <xsl:value-of select="substring(name, 4)"/> </xsl:if> </xsl:copy> </xsl:template> </xsl:stylesheet>
The first template rule uses a new XPath 2.0 function,
current-date( )
, to insert the current date into a
date
element in the result tree, then it applies
templates for function
elements. In the second
template rule, the first argument of matches( )
is
name
—a child node of
function
. The content of name
is the string that this function attempts to match. The second
argument is a regular expression. ^fn
: looks for
the letters fn: at the beginning of the line
(^
). If matches( )
finds
^fn
: and returns true, the
value-of
element in the template of
if
writes a substring from the content of
name
beginning from the fourth character, thus
eliminating fn
:.
Transform functions.xml with match.xsl with:
java -jar saxon7.jar functions.xml match.xsl
and you will see this result:
<?xml version="1.0" encoding="UTF-8"?> <list> <description>XPath 2.0 Context Functions</description> <date>2003-10-03</date> <function>context-item( )</function> <function>position( )</function> <function>last( )</function> <function>current-dateTime( )</function> <function>current-date( )</function> <function>current-time( )</function> <function>default-collation( )</function> <function>implicit-timezone( )</function> </list>
The new replace( )
function in XPath 2.0
returns the value of the first argument with every substring matched
by the regular expression in the second argument, replaced by the
string in the third argument. Example 16-4, the
stylesheet
replace.xsl
,
will show you how it works.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="functions">
<xsl:element name="list">
<xsl:element name="description">XPath 2.0 Context Functions</xsl:element>
<xsl:element name="date">
<xsl:value-of select="current-date( )"/>
</xsl:element>
<xsl:apply-templates select="function"/>
</xsl:element>
</xsl:template>
<xsl:template match="function">
<xsl:copy>
<xsl:value-of select="replace(name, '^fn:', '')"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The first argument of replace( )
is the
name
element, meaning the content of the
name
element. The second argument is the regular
expression you are looking for, and the third argument is the string
you want to replace the second argument with. If you process
functions.xml with:
java -jar saxon7.jar functions.xml replace.xsl
it will produce the same output as match.xsl.
Finally, the instruction element
analyze-string
is also new in XSLT 2.0.
This element allows you to select a string using the
select
attribute, and then search that string with
a regular expression defined in a regex
attribute.
Two children can then follow analyze-string
:
matching-substring
to define what happens when
analyze-string
finds a matching substring, and can
follow non-matching-substring
to define what
happens when analyze-string
finds a non-matching
substring. You can use either matching-substring
or non-matching-substring
or both. (Also,
analyze-string
accepts fallback
as a child.)
The
regex.xsl
stylesheet, Example 16-5, uses
analyze-string
to handle some text in a node.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="functions"> <xsl:element name="list"> <xsl:element name="description">XPath 2.0 Context Functions</xsl:element> <xsl:element name="date"> <xsl:value-of select="current-date( )"/> </xsl:element> <xsl:apply-templates select="function"/> </xsl:element> </xsl:template> <xsl:template match="function"> <xsl:copy> <xsl:analyze-string select="name" regex="^fn:"> <xsl:matching-substring></xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:copy> </xsl:template> </xsl:stylesheet>
The second template searches the content of
function
elements in the source tree. When
analyze-string
finds the string
fn
: at the beginning of a line, it replaces the
matching substring with nothing in the result tree and outputs the
matching substring as is using value-of
.
Execute the transformation with this command:
java -jar saxon7.jar functions.xml regex.xsl
and you will get the following result:
<?xml version="1.0" encoding="UTF-8"?> <list> <description>XPath 2.0 Context Functions</description> <date>2003-08-26</date> <function>context-item( )</function> <function>position( )</function> <function>last( )</function> <function>current-dateTime( )</function> <function>current-date( )</function> <function>current-time( )</function> <function>default-collation( )</function> <function>implicit-timezone( )</function> </list>
This same effect can be achieved by using replace(
)
or even matches( )
, as you saw
earlier. The main reason for using analyze-string
is when the replacement text contains elements—for example, you
could use analyze-string
to replace a line break
by a br
tag.
These examples give you a taste of what is possible using regular expressions. For more information on the regular expressions used by XML Schema, and XSLT 2.0 by association, see http://www.w3.org/TR/xmlschema-0.html#regexAppendix and http://www.w3.org/TR/xmlschema-2.html#regexs.
Grouping in XSLT is the process by which you can group nodes based on a given criterion. In XSLT 1.0, the process is a little complicated and requires somewhat elaborate expressions, often employing the preceding-sibling axes to check whether a node belongs to a group. You could also group nodes with a key using the Muenchian method, which was demonstrated in Chapter 11. You can also read about how to do XSLT 1.0 grouping in Chapter 6 of Doug Tidwell’s XSLT (O’Reilly) or in Chapter 9 of Michael Kay’s XSLT Programmer’s Reference, Second Edition (Wrox). I prefer grouping in XSLT 2.0 because it is much simpler and easier to explain, the ease of which probably grew out of my experience with grouping in Version 1.0
Grouping in XSLT 1.0 usually brings the for-each
instruction element into service. XSLT 2.0 has a new instruction
element called for-each-group
that makes grouping
a relative snap. I’ll show you how in the following
example.
Glance at
group2.xml
,
in Example 16-6, which lumps the XPath
2.0’s context-related functions into two piles by
labeling them with a type
attribute.
<?xml version="1.0"?> <list> <description>XPath 2.0 Context Functions</description> <date>2003-10-03</date> <function type="new">context-item( )</function> <function type="new">current-date( )</function> <function type="new">current-dateTime( )</function> <function type="new">current-time( )</function> <function type="new">default-collation( )</function> <function type="new">implicit-timezone( )</function> <function type="legacy">last( )</function> <function type="legacy">position( )</function> </list>
The eight functions in this list are either legacy
or new
functions. The
group2.xsl
stylesheet, in Example 16-7, groups the functions in
group2.xml according to the content of the
type
attribute.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="list"> <xsl:copy> <xsl:for-each-group select="function" group-by="@type"> <functions type="{@type}"> <xsl:value-of select="current-group( )" separator=", "/> </functions> </xsl:for-each-group> </xsl:copy> </xsl:template> </xsl:stylesheet>
The
for-each-group
function selects the
node-set to group with the select
attribute—all function
children of
list
, that is. The group-by
attribute determines the key for grouping, which, in this case, is
the content of the type
attribute in the source.
The functions
literal result element uses an
attribute value template to reflect the value of the
type
attribute.
The value-of
element’s
select
attribute uses the current-group(
)
function—also a new kid on the block
in XSLT 2.0—to keep track of which group is which. The
separator
attribute is also a new addition to XSLT
2.0. It tells the XSLT 2.0 processor to write a comma followed by a
space after each found node is sent to the result tree.
In XSLT 1.0, value-of
outputs only the first node
of a returned node-set in string form; in XSLT 2.0, all nodes can be
returned, so you have to plan accordingly.
You might guess correctly that for-each-group
has
several other attributes, which it does, namely,
group-adjacent
,
group-starting-with
,
group-ending-with
, and
collation
. I’m not going to cover
them here, but you can read more about
for-each-group
and its attributes in Section 14 of
the XSLT 2.0 specification.
Use this command to transform group.xml:
java -jar saxon7.jar group2.xml group2.xsl
The result is two lists of functions, grouped and comma-separated, in
functions
elements:
<?xml version="1.0" encoding="UTF-8"?> <list> <functions type="new">context-item( ), current-date( ), current-dateTime( ), current-time( ), default-collation( ), implicit-timezone( )</functions> <functions type="legacy">last( ), position( )</functions> </list>
This example should give you a feel of how to group nodes in XSLT
2.0. In the example that follows, you will learn how to use the new
top-level function
element.
You learned about external extension
functions in the last chapter. You can now add extension functions on
the stylesheet level in XSLT 2.0 using the
function
element. These are called
stylesheet functions, but they work like any
extension function in an expression. The difference is that they are
completely portable between one XSLT 2.0 processor and another.
Example 16-8,
function.xsl
,
uses function
to declare a stylesheet function.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:wy="http://www.wyeast.net/functions"> <xsl:output method="text"/> <xsl:function name="wy:kilometers"> <xsl:param name="miles" as="xs:decimal"/> <xsl:sequence select="$miles * 1.609347"/> </xsl:function> <xsl:template match="/"> <xsl:apply-templates select="trip"/> </xsl:template> <xsl:template match="trip"> <xsl:apply-templates select="distance"/> </xsl:template> <xsl:template match="distance"> <xsl:text>The distance from </xsl:text> <xsl:value-of select="location"/> <xsl:text> to </xsl:text> <xsl:value-of select="destination"/> <xsl:text> is </xsl:text> <xsl:value-of select="round(wy:kilometers(miles))"/> <xsl:text> kilometers. </xsl:text> </xsl:template> </xsl:stylesheet>
When I tested this, it appeared that stylesheet functions must have
at least one argument, but this may not be the case, given that 2.0
is still in the early stages. Stylesheet functions must also be
identified with a QName that uses a prefix (this is to ensure that
user-defined functions don’t clash with
system-defined functions). The namespace URI and prefix associated
with the QName in this example is
http://www.wyeast.net/functions
and
wy
:, respectively. It’s declared
on the stylesheet
element.
The function
element must be on the top level and
declares the stylesheet function named wy:kilometers(
)
. The function performs a simple conversion of miles to
kilometers by accepting a single parameter, miles
.
Parameters for stylesheet functions are defined with
param
elements but cannot have default values. The
new as
attribute on param
declares the value of miles as an xs:decimal
value, according to the boundaries set by XML Schema datatypes (the
namespace is declared on the document element).
The new XSLT 2.0
sequence
element adds a sequence of nodes or atomic
values to the result tree. In this case, it returns a product (a
single atomic value) and works much like value-of
.
In other situations, you can add existing nodes to a sequence with
this element, not just new ones. The factor for converting miles to
kilometers (1.609347
) comes from the National
Institute of Standards and Technology (NIST), and is based on the
U.S. survey foot (see http://physics.nist.gov/Pubs/SP811/appenB8.html).
The wy:kilometers( )
function is called later in
the stylesheet in a value-of
element. It takes a
miles node as an argument, and its return value is rounded up or down
with the round( )
function. The result is output
as text, embedded in a sentence formed from the nodes in the source
tree.
Soon, you’ll apply this stylesheet to trip.xml, shown in Example 16-9, which holds the road mileage between several U.S. cities.
<?xml version="1.0"?> <trip> <distance> <location>Tucson</location> <destination>Flagstaff</destination> <miles>253</miles> </distance> <distance> <location>Portland</location> <destination>Medford</destination> <miles>272</miles> </distance> <distance> <location>Denver</location> <destination>Colorado Springs</destination> <miles>67</miles> </distance> </trip>
Perform the transformation with:
java -jar saxon7.jar trip.xml function.xsl
You will see this outcome on your screen:
The distance from Tucson to Flagstaff is 407 kilometers. The distance from Portland to Medford is 438 kilometers. The distance from Denver to Colorado Springs is 108 kilometers.
The wy:kilometers( )
stylesheet function may be
reused as often as you need it in this stylesheet. A stylesheet
function can also be included or imported from another
stylesheet.
XSLT 2.0 and XPath 2.0 offer an almost overwhelming number of new features. Some have complained about the new versions of XSLT and XPath on this count. Personally, I like most of the new offerings and, fortunately, no one is forced to adopt all the new functionality. Nevertheless, the terminology will definitely require devotees to plow deeply into the new specifications in order to get a grip on it.
This chapter lightly introduced you to many highlights from these new technologies. It also walked you through how to output multiple result documents, define and use regular expressions, use grouping, and create stylesheet functions.
The next chapter shows programmers how to use APIs to write your own interface to an XSLT processor.
18.117.158.165