The goal of XSL has evolved over time. Today, XSL is a blanket term for a number of derived technologies that altogether better qualify and implement the original idea of styling XML documents. The various components that fall under the umbrella of XSL are the actual software entities that you use in your code:
XSLT
Rule-based language for transforming XML documents into any other text-based format. XSLT provides for XML-to-XML transformation, which mostly means schema transformation. An XSLT program is a generic set of transformation rules whose output can be any text-based language, including HTML, Rich Text Format (RTF), and Wireless Markup Language (WML), to name just a few.
XPath
Query language that XSLT programs use to select specific parts of an XML document. The result of XPath expressions is then parsed and elaborated by the XSLT processor. Normally, the XSLT processor works sequentially on the source document, but it resorts to XPath if it needs to access and refer to particular groups of nodes. XPath was covered in Chapter 6.
XSL Formatting Objects (XSL-FO)
Advanced styling features expressed by an XML vocabulary that define the semantics of a set of formatting elements. Most of these formatting objects are borrowed from CSS, Level 2 (CSS2) properties, but others have been added. (See the section “Further Reading,” on page 343, for more information.)
XSL and XSLT are not the same thing. XSL still refers to the page styling, of which XML transformations to arbitrary text are just one aspect, albeit the most important aspect. This chapter will accentuate the Microsoft .NET Framework implementation of XSLT. Before going any further with the .NET Framework core classes for data transformation, let’s briefly recap the main concepts of XSLT and the programming tools it provides to developers.
XSLT is a process that combines two XML documents—the XML source file and the style sheet—to produce a third document. The resultant document can be an XML document, an HTML page, or any text-based file the style sheet has been instructed to generate.
The source document must meet only one requirement: it must be a well-formed XML document. The style sheet must be a valid XML document that contains the transformation logic expressed using the elements in the XSLT vocabulary. An XSLT style sheet can be seen as a sequence of templates. Each template takes one or more source elements as input and returns some output text based on literals as well as transformed input data. Figure 7-1 illustrates the transformation process.
The core part of the transformation process is the application of templates to XML source elements. Other ancillary steps might include the expansion of elements to text, the execution of some script code, and the selection of a subset of nodes using XPath queries. The layout of a generic XSLT script is shown here:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/xsl/transform"> <xsl:template match="/"> ⋮ </xsl:template> <xsl:template match="..."> ⋮ </xsl:template> ⋮ </xsl:stylesheet>
The root node of an XSLT script is <stylesheet>. The <stylesheet> node belongs to the official W3C namespace for XSLT 1.0. (Note that the .NET Framework supports only XSLT 1.0, but the W3C committees are currently working on a draft of XSLT 1.1.) Below the <stylesheet> node are a variety of <template> nodes, each of which contains a match attribute. The match attribute contains a valid XPath expression that selects the source node (or nodes) that will be used to fill the template.
The template consists of some output literal text interspersed with XSLT placeholder tags. At compile time, the XSLT processor reads source data for any matching nodes and dynamically populates all the placeholders. The source markup text is poured into the template in various forms according to the particular XSLT instruction used. Text or attribute values can be copied or preprocessed using script code or extension objects. In addition, you can apply some basic flow constructs such as if, when, and for-each as well as process nodes in a particular order or filtered by an ad hoc XPath expression.
The final output of each template must form a syntactically valid fragment in the target language—be it XML, HTML, RTF, or some other language. You are not required to indicate the target language explicitly, although the XSLT vocabulary provides a tailor-made instruction to declare what the expected output will be. The main requirement for the XSLT style sheet is that its overall text be well-formed XML. In addition, it must make syntactically correct use of all the XSLT instructions it needs. The syntax of each embedded XSLT command, therefore, is validated against the official XSLT schema.
Although an XSLT style sheet is not necessarily composed of explicitly declared templates, in many real-world cases, it is. In other situations, you can have an XSLT style sheet that consists of plain XSLT instructions not grouped as individually callable templates.
A template to the XSLT language is much like a function to other high-level programming languages. You can group more instructions under a function or a method, but you can also embed in the source program instructions to run sequentially.
In the body of an XSLT style sheet, a template is always defined with in-line code, but it can be configured, and subsequently invoked, in two ways: it can have implicit or explicit arguments. With implicit arguments, you use the match attribute to select the nodes for the template to process. In this case, you apply the template to the matching nodes.
With explicit arguments, you give the template a name and optionally some arguments and let other templates call it explicitly. Like a DLL function, the invoked template can try to determine its context by using XPath expressions, or it can work in isolation, using only the passed arguments. In this case, you call the template to operate on some arguments. We’ll look at some examples of template calls in the section “From XML to HTML,” on page 299. In the meantime, Figure 7-2 illustrates the process of applying templates to nodes.
The XSLT vocabulary consists of special tags that represent particular operations you can perform on the source markup text or passed arguments. Although the overall syntax is that of a rigorous XML dialect, you can easily recognize the main constructs of a high-level programming language.
The following subsections summarize the main XSLT instructions you are likely to run across in your XSLT experience. The XSLT instructions are divided into four categories: templates, data manipulation, control flow, and layout.
An XSLT template is a mixed-content template consisting of verbatim text and expandable placeholders. A template can be applied to a selected group of nodes as well as invoked by other templates with or without arguments. Table 7-1 lists the main commands for working with templates. All of these XSLT elements are qualified with the xsl prefix, but bear in mind that xsl is just an arbitrary, although common, namespace prefix. Feel free to replace it with another prefix in your own code.
When you set the select attribute, the template (or the parameter) will execute in the context of the selected nodes. Any further XPath expression to locate the text of a particular node or attribute must be based in that context.
The commands listed in Table 7-2 are helpful for extracting data out of source nodes and then preprocessing it using in-place code.
Instruction | Description |
---|---|
<xsl:value-of select="…"> | Returns the value of the specified attribute or the text associated with the given node. You select nodes using XPath expressions. Of course, attributes must be prefixed with an at sign (@). This command works more or less as a macro that expands at run time. |
<xsl:copy-of select="…"> | Returns the entire node-set that corresponds to the results of the specified XPath expression. |
<xsl:sort select="…" data-type="…" order="…" case-order="…"> | Specifies sort criteria for the node-set being processed by <xsl:for-each> or <xsl:apply-templates> instructions. In this case, you use the select keyword to indicate the sort key and data-type for the type of sorting (text or number). The order attribute indicates the direction, and case-order designates which case comes first in the sort. |
<xsl:eval>FuncName() </xsl:eval> | Evaluates a user-defined function and returns the output. The function can access the underlying XML Document Object Model (XML DOM) using the this keyword as the entry point to the document root node. The <xsl:eval> tag is a Microsoft extension to the XSL implementation. |
Each XSLT implementation supports a different set of languages for writing user-defined functions. For example, Microsoft’s XML Core Services (MSXML) supports only Microsoft Visual Basic, Scripting Edition (VBScript) and JScript. The .NET Framework transformation classes, on the other hand, include support for C# and Microsoft Visual Basic .NET. (More on this later.)
Note
The syntax shown for the XSLT instructions is largely incomplete. I limited the descriptions to the most important and most frequently used attributes. More attributes are actually available; you can find them documented and explained in the MSDN documentation as well as in the resources listed in the section “Further Reading,” on page 343.
The XSLT vocabulary includes some tags that represents control flow statements such as conditional and iterative statements. Table 7-3 summarizes the most important commands.
Although this list of commands lacks a for statement, you can still realize a loop that runs a specified number of times by using the XPath position function. Of course, position returns the index of the current context node and is not a general variable counter. On the other hand, XSLT instructions are designed to work on XPath node-sets, not to arrange general-purpose programs.
A typical task for an XSLT script is the creation of new elements and attributes. Sometimes attributes and node elements can be hard-coded in script; sometimes this is just impossible to do. The XSLT statements listed in Table 7-4 let you programmatically create layout elements.
In addition to the instructions described in this section, the XSLT vocabulary contains a few more elements to define data-bound variables (<xsl:variable>), raw text (<xsl:text>), or numbers (<xsl:number>). In particular, a data-bound variable can be given a name and its value calculated either by evaluating an XPath expression or by applying the template in the body of the tag.
After our brief but intensive tour of the XSLT programming interface, let’s see how to turn some of these instructions into concrete calls in a real XSLT script. We’ll look at a couple of typical examples: converting XML documents to HTML pages, and transforming an XML document into an equivalent schema.
Let’s return to our faithful XML document (data.xml) from previous chapters and turn it into a compelling HTML page. This sample XML document contains information about the employees in the Northwind database’s Employees table.
The idea is to create a final HTML page that renders the information about employees through a table. The structure of the XSLT script is shown in the following code:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <HTML> <BODY> <H1>Northwind’s Employees</H1> <TABLE> <xsl:apply-templates select="MyDataSet/NorthwindEmployees/Employee" /> </TABLE> </BODY> </HTML> </xsl:template> ⋮ more templates here ⋮ </xsl:stylesheet>
As the match attribute indicates, the main <xsl:template> instruction applies to the root of the XML document. The XSLT script produces a simple HTML page with a fixed H1 heading and a table. The table is generated by applying all matching templates to the nodes that match the following XPath expression:
MyDataSet/NorthwindEmployees/Employee
The actual templates that make the final HTML page are defined later in the document. To start off, you define a template for each <Employee> node, as shown here:
<xsl:template match="Employee"> <TR> <xsl:apply-templates select="employeeid" /> <xsl:apply-templates select="lastname" /> <xsl:apply-templates select="title" /> </TR> </xsl:template>
The template defines a wrapper table row and then calls into the child templates, one for each significant piece of information to be rendered. As you’ve probably guessed, each child template defines a table cell. For example, the following template selects the <employeeid> node below the current Employee and renders the text of the node in boldface:
<xsl:template match="employeeid"> <TD bgcolor="yellow" style="border:1px solid black"> <B><xsl:value-of select="." /></B> </TD> </xsl:template>
As you can see, the node selection is always performed using XPath expressions. The “.” expression for the <xsl:value-of> node refers to the text of the current node. A similar pattern is used for other templates, as follows:
<xsl:template match="lastname"> <TD style="border:1px solid black"> <B><xsl:value-of select="."/></B>, <xsl:value-of select="../firstname"/> </TD> </xsl:template> <xsl:template match="title"> <TD style="border:1px solid black"> <I><xsl:value-of select="."/></I> </TD> </xsl:template>
In the first template, the context node is <lastname>, but at a certain point, we need to access a sibling node—the <firstname> node. The XPath syntax includes the double-dot symbol (..), which is a shortcut for the parent of the current context node. (See Chapter 6.)
The final HTML output for the source XML document is shown Figure 7-3.
To display the HTML output as plain text, you must perform the transformation programmatically, using either the MSXML object model or the newest .NET Framework classes. Alternatively, you can view the output using a specialized browser with the direct browsing functionality. Microsoft Internet Explorer has provided this capability since version 5.0.
Internet Explorer applies a silent and automatic transformation to all XML documents you view through it. However, an XML document can override the default Internet Explorer style sheet by using a processing instruction that simply links an XSLT script.
The following code demonstrates how to add the style sheet from the previous section (emplist.xsl) to the source file (data.xml) so that double-clicking it generates the output shown in Figure 7-3. A style sheet can have either a .xsl or a .xml extension.
<!-- Directly browsable using a custom XSLT script --> <?xml-stylesheet type="text/xsl" href="emplist.xsl"?>
You register a style sheet with an XML document using a processing instruction with a couple of attributes: type and href. The type attribute must be set to the string text/xsl. The href attribute instead references the URL of the XSLT script. If you insert more than one processing instruction for XSLT scripts, only the final instruction will be considered.
The previous example used <xsl:apply-templates> exclusively to perform template-based transformations. When you know that only one template applies to a given block of XML source code, you might want to use a more direct instruction: <xsl:call-template>.
If you plan to use the <xsl:call-template> instruction, you must first give the target template a name. For example, the following code defines a template named EmployeeIdTemplate:
<xsl:template name="EmployeeIdTemplate"> <TD bgcolor="yellow" style="border:1px solid black"> <B><xsl:value-of select="employeeid"/></B> </TD> </xsl:template>
How do you call into this template? Just use the following code:
<xsl:template match="Employee">
<TR>
<xsl:call-template name="EmployeeIdTemplate" />
<xsl:apply-templates select="lastname" />
<xsl:apply-templates select="title" />
</TR>
</xsl:template>
There is one difference you should be aware of. With <xsl:apply-templates>, you use the select attribute to select a node-set for the template, as shown here:
<xsl:apply-templates select="employeeid" />
As a result, the template works on the <employeeid> node and retrieves the value with the following expression:
<xsl:value-of select="." />
When you use the <xsl:call-template> instruction, on the other hand, you call the template by name, but it works on the currently selected context node. The ongoing context node is <Employee>, and you must explicitly indicate the child node in the body of <xsl:value-of>, as shown here:
<xsl:value-of select="employeeid" />
Transforming an XML document into an XML document with another schema is in no way different from transforming XML into HTML. The real difference is that you use another target XML vocabulary.
The following XSLT script is designed to simplify the structure of our sample data.xml file. The original file is structured like this:
<MyDataSet> <NorthwindEmployees> <Employee> <employeeid>…</employeeid> <lastname>…</lastname> <firstname>…</firstname> <title>…</title> </Employee> ⋮ </NorthwindEmployees> </MyDataSet>
The expected target schema is simpler and contains only two levels of nodes, as shown in the following code. In addition, all employee information is now coded using attributes instead of child nodes, and last and first names are merged into a single value.
<Employees database="northwind"> <Employee id="1" name="Davolio, Nancy" title="Sales Representative" /> ⋮ </Employee> </Employees>
The following script performs the magic:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="MyDataSet/NorthwindEmployees"> <Employees database="northwind"> <xsl:for-each select="Employee"> <xsl:element name="Employee"> <xsl:attribute name="id"> <xsl:value-of select="employeeid" /> </xsl:attribute> <xsl:attribute name="name"> <xsl:value-of select="lastname" />, <xsl:value-of select="firstname" /> </xsl:attribute> <xsl:attribute name="title"> <xsl:value-of select="title" /> </xsl:attribute> </xsl:element> </xsl:for-each> </Employees> </xsl:template> </xsl:stylesheet>
This script includes only one template rooted in the <NorthwindEmployees> node and creates a new element for each child <Employee> node. The node has a few attributes: id, name, and title. The <xsl:value-of> instruction is used to read node values into the newly created attributes. The final output is shown here:
<?xml version="1.0" encoding="utf-8"?> <Employees database="northwind"> <Employee id="1" name="Davolio, Nancy" title="Sales Representative"></Employee> <Employee id="2" name="Fuller, Andrew" title="Vice President, Sales"></Employee> <Employee id="3" name="Leverling, Janet" title="Sales Representative"></Employee> <Employee id="4" name="Peacock, Margaret" title="Sales Representative"></Employee> <Employee id="5" name="Buchanan, Steve" title="Sales Manager"></Employee> <Employee id="6" name="Suyama, Michael" title="Sales Representative"></Employee> <Employee id="7" name="King, Robert" title="Sales Representative"></Employee> <Employee id="8" name="Callahan, Laura" title="Inside Sales Coordinator"></Employee> <Employee id="9" name="Dodsworth, Anne" title="Sales Representative"></Employee> </Employees>
As you can see, transforming XML into another arbitrary text-based language is simply a matter of becoming familiar with a relatively small vocabulary of ad hoc tags. The XSLT vocabulary is a bit peculiar because some of its tags look a lot like high-level programming language statements. But grasping the essence of XSLT is not all that difficult.
3.14.252.136