What Is XSLT, Anyway?

The goal of XSL has evolved over time. Today, XSL is a blanket term for a number of derived technologies that altogether better qualify and implement the original idea of styling XML documents. The various components that fall under the umbrella of XSL are the actual software entities that you use in your code:

  • XSLT

    Rule-based language for transforming XML documents into any other text-based format. XSLT provides for XML-to-XML transformation, which mostly means schema transformation. An XSLT program is a generic set of transformation rules whose output can be any text-based language, including HTML, Rich Text Format (RTF), and Wireless Markup Language (WML), to name just a few.

  • XPath

    Query language that XSLT programs use to select specific parts of an XML document. The result of XPath expressions is then parsed and elaborated by the XSLT processor. Normally, the XSLT processor works sequentially on the source document, but it resorts to XPath if it needs to access and refer to particular groups of nodes. XPath was covered in Chapter 6.

  • XSL Formatting Objects (XSL-FO)

    Advanced styling features expressed by an XML vocabulary that define the semantics of a set of formatting elements. Most of these formatting objects are borrowed from CSS, Level 2 (CSS2) properties, but others have been added. (See the section “Further Reading,” on page 343, for more information.)

XSL and XSLT are not the same thing. XSL still refers to the page styling, of which XML transformations to arbitrary text are just one aspect, albeit the most important aspect. This chapter will accentuate the Microsoft .NET Framework implementation of XSLT. Before going any further with the .NET Framework core classes for data transformation, let’s briefly recap the main concepts of XSLT and the programming tools it provides to developers.

XSLT Template Programming

XSLT is a process that combines two XML documents—the XML source file and the style sheet—to produce a third document. The resultant document can be an XML document, an HTML page, or any text-based file the style sheet has been instructed to generate.

The source document must meet only one requirement: it must be a well-formed XML document. The style sheet must be a valid XML document that contains the transformation logic expressed using the elements in the XSLT vocabulary. An XSLT style sheet can be seen as a sequence of templates. Each template takes one or more source elements as input and returns some output text based on literals as well as transformed input data. Figure 7-1 illustrates the transformation process.

Figure 7-1. An overview of the XSLT process.


The core part of the transformation process is the application of templates to XML source elements. Other ancillary steps might include the expansion of elements to text, the execution of some script code, and the selection of a subset of nodes using XPath queries. The layout of a generic XSLT script is shown here:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/xsl/transform">
  <xsl:template match="/">
  ⋮
  </xsl:template>
  <xsl:template match="...">
  ⋮
  </xsl:template>
  ⋮
</xsl:stylesheet>

The root node of an XSLT script is <stylesheet>. The <stylesheet> node belongs to the official W3C namespace for XSLT 1.0. (Note that the .NET Framework supports only XSLT 1.0, but the W3C committees are currently working on a draft of XSLT 1.1.) Below the <stylesheet> node are a variety of <template> nodes, each of which contains a match attribute. The match attribute contains a valid XPath expression that selects the source node (or nodes) that will be used to fill the template.

The template consists of some output literal text interspersed with XSLT placeholder tags. At compile time, the XSLT processor reads source data for any matching nodes and dynamically populates all the placeholders. The source markup text is poured into the template in various forms according to the particular XSLT instruction used. Text or attribute values can be copied or preprocessed using script code or extension objects. In addition, you can apply some basic flow constructs such as if, when, and for-each as well as process nodes in a particular order or filtered by an ad hoc XPath expression.

The final output of each template must form a syntactically valid fragment in the target language—be it XML, HTML, RTF, or some other language. You are not required to indicate the target language explicitly, although the XSLT vocabulary provides a tailor-made instruction to declare what the expected output will be. The main requirement for the XSLT style sheet is that its overall text be well-formed XML. In addition, it must make syntactically correct use of all the XSLT instructions it needs. The syntax of each embedded XSLT command, therefore, is validated against the official XSLT schema.

Although an XSLT style sheet is not necessarily composed of explicitly declared templates, in many real-world cases, it is. In other situations, you can have an XSLT style sheet that consists of plain XSLT instructions not grouped as individually callable templates.

A template to the XSLT language is much like a function to other high-level programming languages. You can group more instructions under a function or a method, but you can also embed in the source program instructions to run sequentially.

In the body of an XSLT style sheet, a template is always defined with in-line code, but it can be configured, and subsequently invoked, in two ways: it can have implicit or explicit arguments. With implicit arguments, you use the match attribute to select the nodes for the template to process. In this case, you apply the template to the matching nodes.

With explicit arguments, you give the template a name and optionally some arguments and let other templates call it explicitly. Like a DLL function, the invoked template can try to determine its context by using XPath expressions, or it can work in isolation, using only the passed arguments. In this case, you call the template to operate on some arguments. We’ll look at some examples of template calls in the section “From XML to HTML,” on page 299. In the meantime, Figure 7-2 illustrates the process of applying templates to nodes.

Figure 7-2. Applying an XSLT template to source markup text.


XSLT Instructions

The XSLT vocabulary consists of special tags that represent particular operations you can perform on the source markup text or passed arguments. Although the overall syntax is that of a rigorous XML dialect, you can easily recognize the main constructs of a high-level programming language.

The following subsections summarize the main XSLT instructions you are likely to run across in your XSLT experience. The XSLT instructions are divided into four categories: templates, data manipulation, control flow, and layout.

Template Instructions

An XSLT template is a mixed-content template consisting of verbatim text and expandable placeholders. A template can be applied to a selected group of nodes as well as invoked by other templates with or without arguments. Table 7-1 lists the main commands for working with templates. All of these XSLT elements are qualified with the xsl prefix, but bear in mind that xsl is just an arbitrary, although common, namespace prefix. Feel free to replace it with another prefix in your own code.

Table 7-1. XSLT Instructions for Templates
Instruction Description
<xsl:template match="…" | name="…"> Defines the transformation rules for the nodes that match the XPath expression set in the match attribute. The template must be explicitly applied to its nodes using the <xsl:apply-templates> command. The instruction can also be used to declare a template that will then be called by name using the <xsl:call-template> command. In this case, use the name attribute instead of match.
<xsl:apply-template select="…"> Applies all the possible templates to the elements that match the XPath description. The select attribute selects the target elements. In general, a single element can be affected by multiple templates.
<xsl:call-template name="…"> Executes the specified template. The name attribute indicates the name of the previously declared template to execute.
<xsl:param name="…" select="…"> </xsl:param> Defines a formal argument for a named template. The name attribute indicates the name of the argument. The parameter can have a default argument. You specify a default value using either an XPath expression (via the select attribute) or a template as the body of the element.
<xsl:with-param name="…" select="…"> </xsl:with-param> Defines an actual parameter for a template call. The name attribute indicates the matching parameter. The actual value can be expressed using either an XPath expression (via the select attribute) or the body of the element.

When you set the select attribute, the template (or the parameter) will execute in the context of the selected nodes. Any further XPath expression to locate the text of a particular node or attribute must be based in that context.

Data Manipulation Instructions

The commands listed in Table 7-2 are helpful for extracting data out of source nodes and then preprocessing it using in-place code.

Table 7-2. XSLT Instructions for Data Manipulation
Instruction Description
<xsl:value-of select="…"> Returns the value of the specified attribute or the text associated with the given node. You select nodes using XPath expressions. Of course, attributes must be prefixed with an at sign (@). This command works more or less as a macro that expands at run time.
<xsl:copy-of select="…"> Returns the entire node-set that corresponds to the results of the specified XPath expression.
<xsl:sort select="…" data-type="…" order="…" case-order="…"> Specifies sort criteria for the node-set being processed by <xsl:for-each> or <xsl:apply-templates> instructions. In this case, you use the select keyword to indicate the sort key and data-type for the type of sorting (text or number). The order attribute indicates the direction, and case-order designates which case comes first in the sort.
<xsl:eval>FuncName() </xsl:eval> Evaluates a user-defined function and returns the output. The function can access the underlying XML Document Object Model (XML DOM) using the this keyword as the entry point to the document root node. The <xsl:eval> tag is a Microsoft extension to the XSL implementation.

Each XSLT implementation supports a different set of languages for writing user-defined functions. For example, Microsoft’s XML Core Services (MSXML) supports only Microsoft Visual Basic, Scripting Edition (VBScript) and JScript. The .NET Framework transformation classes, on the other hand, include support for C# and Microsoft Visual Basic .NET. (More on this later.)

Note

The syntax shown for the XSLT instructions is largely incomplete. I limited the descriptions to the most important and most frequently used attributes. More attributes are actually available; you can find them documented and explained in the MSDN documentation as well as in the resources listed in the section “Further Reading,” on page 343.


Control Flow Instructions

The XSLT vocabulary includes some tags that represents control flow statements such as conditional and iterative statements. Table 7-3 summarizes the most important commands.

Table 7-3. XSLT Instructions for Control Flow
Instruction Description
<xsl:for-each select="…"> </xsl:for-each> Applies the rules in the body to each element that matches the given XPath expression. The node-set can be sorted by putting an <xsl:sort> in the body.
<xsl:if test="…"> </xsl:if> Applies the internal template only if the specified XPath expression evaluates to true.
<xsl:choose> <xsl:when test="…">… </xsl:when> <xsl:otherwise>… </xsl:otherwise><xsl:choose> Similar to the C# switch statement; represents a multiple-choice statement. Each test is expressed using an <xsl:when> statement, while the <xsl:otherwise> element represents the default choice. The statement evaluates all the <xsl:when> blocks until the test expression returns true. When that happens, the corresponding template is applied. If no test is successful, the <xsl:otherwise> template is invoked.

Although this list of commands lacks a for statement, you can still realize a loop that runs a specified number of times by using the XPath position function. Of course, position returns the index of the current context node and is not a general variable counter. On the other hand, XSLT instructions are designed to work on XPath node-sets, not to arrange general-purpose programs.

Layout Instructions

A typical task for an XSLT script is the creation of new elements and attributes. Sometimes attributes and node elements can be hard-coded in script; sometimes this is just impossible to do. The XSLT statements listed in Table 7-4 let you programmatically create layout elements.

Table 7-4. XSLT Instructions for Layout
Instruction Description
<xsl:element name = "…" namespace = "…"> </xsl:element> Creates an element with the specified name. The namespace attribute indicates the URI of the created element, if any. The <xsl:element> element contains a template for the attributes and children of the created element.
<xsl:attribute name = "…" namespace = "…"> </xsl:attribute> Creates an attribute node and attaches it to an output element. The name attribute denotes the name of the attribute, and namespace indicates the namespace URI, if any. The contents of this element specify the value of the attribute. Note that <xsl:attribute> can also be used directly on output elements, not only in conjunction with <xsl:element>.
<xsl:processing-instruction name="…"> </xsl:processing-instruction> Generates a processing instruction in the output text. The name attribute represents the name of the processing instruction. The contents of the element provide the text of the processing instruction.
<xsl:comment> Generates a comment node in the output text. The text generated by the body of <xsl:comment> appears between the typical comment wrappers <!-- and -->.

In addition to the instructions described in this section, the XSLT vocabulary contains a few more elements to define data-bound variables (<xsl:variable>), raw text (<xsl:text>), or numbers (<xsl:number>). In particular, a data-bound variable can be given a name and its value calculated either by evaluating an XPath expression or by applying the template in the body of the tag.

After our brief but intensive tour of the XSLT programming interface, let’s see how to turn some of these instructions into concrete calls in a real XSLT script. We’ll look at a couple of typical examples: converting XML documents to HTML pages, and transforming an XML document into an equivalent schema.

From XML to HTML

Let’s return to our faithful XML document (data.xml) from previous chapters and turn it into a compelling HTML page. This sample XML document contains information about the employees in the Northwind database’s Employees table.

The idea is to create a final HTML page that renders the information about employees through a table. The structure of the XSLT script is shown in the following code:

<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="1.0">

  <xsl:template match="/">
    <HTML>
      <BODY>
      <H1>Northwind’s Employees</H1>
        <TABLE>
          <xsl:apply-templates 
               select="MyDataSet/NorthwindEmployees/Employee" />
        </TABLE>
      </BODY>
    </HTML>
  </xsl:template>

  ⋮
  more templates here
  ⋮
</xsl:stylesheet>

As the match attribute indicates, the main <xsl:template> instruction applies to the root of the XML document. The XSLT script produces a simple HTML page with a fixed H1 heading and a table. The table is generated by applying all matching templates to the nodes that match the following XPath expression:

MyDataSet/NorthwindEmployees/Employee

The actual templates that make the final HTML page are defined later in the document. To start off, you define a template for each <Employee> node, as shown here:

<xsl:template match="Employee">
  <TR>
    <xsl:apply-templates select="employeeid" />
    <xsl:apply-templates select="lastname" />
    <xsl:apply-templates select="title" />
  </TR>
</xsl:template>

The template defines a wrapper table row and then calls into the child templates, one for each significant piece of information to be rendered. As you’ve probably guessed, each child template defines a table cell. For example, the following template selects the <employeeid> node below the current Employee and renders the text of the node in boldface:

  <xsl:template match="employeeid">
    <TD bgcolor="yellow" style="border:1px solid black">
      <B><xsl:value-of select="." /></B>
    </TD>
  </xsl:template>

As you can see, the node selection is always performed using XPath expressions. The “.expression for the <xsl:value-of> node refers to the text of the current node. A similar pattern is used for other templates, as follows:

  <xsl:template match="lastname">
    <TD style="border:1px solid black">
      <B><xsl:value-of select="."/></B>, 
         <xsl:value-of select="../firstname"/>
    </TD>
  </xsl:template>

  <xsl:template match="title">
    <TD style="border:1px solid black">
      <I><xsl:value-of select="."/></I>
    </TD>
  </xsl:template>

In the first template, the context node is <lastname>, but at a certain point, we need to access a sibling node—the <firstname> node. The XPath syntax includes the double-dot symbol (..), which is a shortcut for the parent of the current context node. (See Chapter 6.)

The final HTML output for the source XML document is shown Figure 7-3.

Figure 7-3. The HTML page generated from a source XML file.


To display the HTML output as plain text, you must perform the transformation programmatically, using either the MSXML object model or the newest .NET Framework classes. Alternatively, you can view the output using a specialized browser with the direct browsing functionality. Microsoft Internet Explorer has provided this capability since version 5.0.

Linking the Style Sheet to the HTML Page

Internet Explorer applies a silent and automatic transformation to all XML documents you view through it. However, an XML document can override the default Internet Explorer style sheet by using a processing instruction that simply links an XSLT script.

The following code demonstrates how to add the style sheet from the previous section (emplist.xsl) to the source file (data.xml) so that double-clicking it generates the output shown in Figure 7-3. A style sheet can have either a .xsl or a .xml extension.

<!-- Directly browsable using a custom XSLT script -->
<?xml-stylesheet type="text/xsl" href="emplist.xsl"?>

You register a style sheet with an XML document using a processing instruction with a couple of attributes: type and href. The type attribute must be set to the string text/xsl. The href attribute instead references the URL of the XSLT script. If you insert more than one processing instruction for XSLT scripts, only the final instruction will be considered.

Calling Templates

The previous example used <xsl:apply-templates> exclusively to perform template-based transformations. When you know that only one template applies to a given block of XML source code, you might want to use a more direct instruction: <xsl:call-template>.

If you plan to use the <xsl:call-template> instruction, you must first give the target template a name. For example, the following code defines a template named EmployeeIdTemplate:

  <xsl:template name="EmployeeIdTemplate">
    <TD bgcolor="yellow" style="border:1px solid black">
      <B><xsl:value-of select="employeeid"/></B>
    </TD>
  </xsl:template>

How do you call into this template? Just use the following code:

<xsl:template match="Employee">
  <TR>
    <xsl:call-template name="EmployeeIdTemplate" />
    <xsl:apply-templates select="lastname" />
    <xsl:apply-templates select="title" />
  </TR>
</xsl:template>

There is one difference you should be aware of. With <xsl:apply-templates>, you use the select attribute to select a node-set for the template, as shown here:

   <xsl:apply-templates select="employeeid" />

As a result, the template works on the <employeeid> node and retrieves the value with the following expression:

<xsl:value-of select="." />

When you use the <xsl:call-template> instruction, on the other hand, you call the template by name, but it works on the currently selected context node. The ongoing context node is <Employee>, and you must explicitly indicate the child node in the body of <xsl:value-of>, as shown here:

<xsl:value-of select="employeeid" />

From Schema to Schema

Transforming an XML document into an XML document with another schema is in no way different from transforming XML into HTML. The real difference is that you use another target XML vocabulary.

The following XSLT script is designed to simplify the structure of our sample data.xml file. The original file is structured like this:

<MyDataSet>
  <NorthwindEmployees>
    <Employee>
      <employeeid>…</employeeid>
      <lastname>…</lastname>
      <firstname>…</firstname>
      <title>…</title>
    </Employee>
    ⋮
  </NorthwindEmployees>
</MyDataSet>

The expected target schema is simpler and contains only two levels of nodes, as shown in the following code. In addition, all employee information is now coded using attributes instead of child nodes, and last and first names are merged into a single value.

<Employees database="northwind">
  <Employee id="1" name="Davolio, Nancy" 
      title="Sales Representative" />
  ⋮
  </Employee>
</Employees>

The following script performs the magic:

<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="1.0">
  <xsl:template match="MyDataSet/NorthwindEmployees">
    <Employees database="northwind">    
      <xsl:for-each select="Employee">
        <xsl:element name="Employee">
          <xsl:attribute name="id">
            <xsl:value-of select="employeeid" />
          </xsl:attribute>
          <xsl:attribute name="name">
            <xsl:value-of select="lastname" />, 
            <xsl:value-of select="firstname" />
          </xsl:attribute>
          <xsl:attribute name="title">
            <xsl:value-of select="title" />
          </xsl:attribute>
        </xsl:element>
      </xsl:for-each>
    </Employees>
  </xsl:template>
</xsl:stylesheet>

This script includes only one template rooted in the <NorthwindEmployees> node and creates a new element for each child <Employee> node. The node has a few attributes: id, name, and title. The <xsl:value-of> instruction is used to read node values into the newly created attributes. The final output is shown here:

<?xml version="1.0" encoding="utf-8"?>
<Employees database="northwind">
  <Employee id="1" name="Davolio, Nancy" 
      title="Sales Representative"></Employee>
  <Employee id="2" name="Fuller, Andrew" 
      title="Vice President, Sales"></Employee>
  <Employee id="3" name="Leverling, Janet" 
      title="Sales Representative"></Employee>
  <Employee id="4" name="Peacock, Margaret" 
      title="Sales Representative"></Employee>
  <Employee id="5" name="Buchanan, Steve" 
      title="Sales Manager"></Employee>
  <Employee id="6" name="Suyama, Michael" 
      title="Sales Representative"></Employee>
  <Employee id="7" name="King, Robert" 
      title="Sales Representative"></Employee>
  <Employee id="8" name="Callahan, Laura" 
      title="Inside Sales Coordinator"></Employee>
  <Employee id="9" name="Dodsworth, Anne" 
      title="Sales Representative"></Employee>
</Employees>

As you can see, transforming XML into another arbitrary text-based language is simply a matter of becoming familiar with a relatively small vocabulary of ad hoc tags. The XSLT vocabulary is a bit peculiar because some of its tags look a lot like high-level programming language statements. But grasping the essence of XSLT is not all that difficult.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.252.136