Chapter 18

Ten Most Confusing Things About XSLT

In This Chapter

bullet Built-in templates do things behind the scenes

bullet Think of XML documents as trees

bullet What a root node is and isn’t

bullet Why the selected node isn’t the same as current node

bullet XPath abbreviations

bullet Location steps and paths

bullet Using xsl:apply-templates, xsl:copy, xsl:copy-of, and xsl:value-of

bullet How the axis dictates the order of the selected node set

bullet Types of XPath expressions

bullet When to use curly brackets

bullet Whitespace in your result documents

X SLT may be logically structured, but it sure does have some peculiarities that can leave you scratching your head if you don’t consider them as you create your stylesheets.

Built-In Template: The Man Behind the Screen

A built-in template rule is like the man behind the screen in the Wizard of Oz — it takes action, but if you don’t realize it, you’ll be confused about what happened and why.

The XSLT processor uses built-in template rules to process any node that is not matched with a template rule you explicitly define in your stylesheet. Each node type has different built-in template rules that are applied to it:

bullet Element nodes have a built-in template that changes child nodes everywhere (children of both the current and root nodes) by removing their tags but preserving their content.

bullet Text and attribute nodes have a built-in template rule that copies their text straight into the result tree.

bullet Processing instructions, comments, and namespaces have a built-in template rule that strips them from the result document.

See Chapter 4 for more information on built-in templates.

Thar’s Trees in Them Documents

Always keep in mind that an XSLT processor doesn’t read an XML document sequentially — one tag at a time — as you or I do; instead, the processor treats the source like a tree-like structure of hierarchical information. Within that tree, relationships among the various parts dictate how the processor reads and navigates the document during the transformation process.

Each XML document has a main document element that contains all the other elements inside its open and close tags. An xsl:stylesheet element, for example, contains template rules and all other parts of an XSLT stylesheet, so it acts as the document element of an XSLT stylesheet. Child elements of the document element are the equivalent of the first-level branches of a tree. These child elements may also have children, much like smaller branches. The XSLT processor works its way through the entire tree until it retrieves each leaf and branch and assembles it based on this hierarchy.

Each leaf and branch in the document tree is called a node. Elements are the most common type of node that you work with, but there are actually seven different node types: element, attribute, namespace, processing instruction, comment, and text. With that in mind, an element node has children not only when it contains other elements, but also when it contains attributes and text.

If you want more information on document trees, check out Chapter 3.

Getting to the Root of the Issue

At first glance, you may naturally look at the following XML snippet and conclude that animals is the root node of this document tree:

<animals>

  <cats>

    <tigers/>

    <lions/>

    <tabby/>

  </cats>

  <dogs>

    <collie/>

    <doberman/>

  </dogs>   

</animals>

Although animals is the highest level element (known as the document element), it is not the root node. The root node is a “built-in” node and automatically serves as the ancestor of all nodes in the document tree. You never actually see the root node show up in your document — it’s just there; a given. Therefore, in the preceding example, animals is a child of the root node.

To demonstrate, the following template rule uses / to retrieve the root node:

<xsl:template match=”/”>

<!-- Do something -->

</xsl:template>

When run on the preceding XML snippet, the animals element is not returned, but the root above it in the tree hierarchy.

Why the Selected Node Is Not the Same as the Current Node

The current node (or context node) of a document tree is the node that the XSLT processor is “on” during its walk through the tree. However, don’t confuse the current node with the selected node or nodes. The current node is the starting point for the XSLT processor for a given location step (an XPath expression used to retrieve nodes from a source tree), but it is the location step that actually determines what node or set of nodes is actually selected.

Those //@.}* Abbreviations

XPath allows you to use abbreviations to write the axis part of a location step. These shortcuts enable you to write XPath expressions more quickly, but they can also be confusing until you learn the clipped syntax. The ones to memorize appear in Table 18-1.

Table 18-1 XPath Abbreviations
Axis Abbreviation
child:: Doesn’t need to be explicitly defined,
so you can leave it off.
attribute:: @
self::node() . (single period)
parent::node() .. (double period)
/descendant-or-self::node()/ //

For more information on axes, see Chapter 5.

To Apply or Copy, That Is the Question

When you create result trees, you may not be sure when to use xsl:apply-templates, xsl:copy, xsl:copy-of, xsl:value-of, and other XSLT instructions inside your template rules. The following guidelines can help you decide what to do:

bullet Use xsl:apply-templates when you want to return the content and text nodes of the current element and its children, but not the surrounding element tags.

bullet Use xsl:copy to preserve the current node’s start and end tags during processing, but not its children or attributes. Content inside the tags is included only if you add an xsl:apply-templates instruction inside the xsl:copy element.

bullet Use xsl:copy-of when you want to copy the whole kit ’n caboodle — the current node’s tags, content, attributes, and children. This instruction copies all the nodes returned from its required select attribute.

bullet Use xsl:value-of when you want to convert the result to text. The conversion process removes all tags and elements. If the result is a single node, its content is converted to text. If the result is a node set, the first node in the set is used in the conversion.

I explain these instructions fully in Chapter 4.

Walk This Way

When the XSLT processor walk the tree to select nodes, the axis part of the location step specifies the direction in which the processor walks. Each of the following axes goes top-to-bottom, left-to-right, much like you read a page in this book: child, self, parent, descendant, following-sibling, following, and descendant-or-self. The remaining axes — ancestor, ancestor-or-self, preceding, and preceding-sibling — travel in reverse order. Finally, when working with attribute and namespace axes, the nodes are always unordered.

Chapter 5 gives you more information on axis values.

Expressions, Paths, and Steps

XPath is used to create expressions, but some types of expressions are more important to XSLT than others. In a generic sense, an expression is a string of XPath instructions that the XSLT processor evaluates to produce a result, which may be a number, string, Boolean value, or a node set. However, XSLT is most interested in a particular kind of expression called a location path, which is a set of instructions that specify what nodes to bring back to the XSLT stylesheet. The location path then consists of a series of smaller parts called location steps. A location step consists of an axis, a node test, and an optional predicate and takes the following form: axis::nodetest[predicate].

Check out Chapter 5 for more details on XPath expressions.

Those Cute Little Curly Braces

Curly braces are used in attribute value templates to tell the XSLT processor to evaluate what’s inside each of them as an expression, rather than as normal text. In the output tree, the curly braces and expression are replaced with a resulting string. However, keep in mind that curly braces only evaluate expressions inside attribute values, not outside them.

Consider the following XML snippet:

  <film name=”Henry V”>

     <director>Kenneth Branagh</director>

     <runtime>137</runtime> 

  </film>

Suppose I want to transform the preceding source by using the following XSLT code:

  <xsl:template match=”film”>

    <!-- Curly braces work inside of attributes -->

    <movie director=”{director}” length=”{runtime}”/>

    <movie newlength=”{100+60}”/>

    <!-- Curly braces do not work here -->

    The director is {director} and length is {runtime} and newlength is {100+60}

    <!-- Instead, use xsl:value-of outside of attributes -->

    The director is <xsl:value-of select=”director”/> and length is <xsl:value-of select=”runtime”/> and newlength is <xsl:value-of select=”100+60”/>

    <xsl:apply-templates/>

  </xsl:template>

  <xsl:template match=”director”/>

  <xsl:template match=”runtime”/>

The result is:

 <movie director=”Kenneth Branagh” length=”137” /><movie newlength=”160” />

  The director is {director} and length is {runtime}

  The director is Kenneth Branagh and length is 137 and newlength is 160

In looking at the transformation, notice that the first part of the template surrounds the element name with curly braces to return the value of the director and runtime elements inside attribute values. In this context, XPath then evaluates director and runtime as element names rather than as plain text. Similarly, XPath evaluates 100+60 as an expression.

The second part of the template shows what happens when you try to use curly braces outside attribute values. These are simply treated as literal text in the output document.

The final part of the template illustrates how you use xsl:value-of to evaluate the same XPath expressions outside attribute values.

See Chapter 4 for more information on attribute value templates.

Whitespace, the Final Frontier

You need to think about several factors as you consider whitespace in your result document, because whitespace has origins in both your XSLT stylesheet and the underlying XML source document.

Inside the XSLT stylesheet, whitespace is usually stripped out of the template before any transformation occurs. However, whitespace is preserved in the following cases:

bullet Text nodes that contain nonwhitespace characters.

bullet Any whitespace text appearing inside a xsl:text element.

bullet When the closest ancestor of a text node has an xml:space attribute with the value of preserve.

Whitespace inside the source XML document follows similar rules, except that you can declare default whitespace rules by using the xsl:preserve-space or xsl:strip-space instructions. Therefore, any text node that occurs inside the range of xsl:preserve-space is preserved.

See Chapter 13 for more details on whitespace.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.88.249