Week 1 In Review

In this first week, you have covered a lot of ground, from learning what XSLT is to developing the foundation needed to create complex stylesheets. That foundation is actually so complete that you can already create stylesheets with many different features. This capability is one of the things that makes XSLT so interesting because you can achieve a great deal with only a few elements and functions. In the next two weeks, you will learn about even more elements and functions, but much of that information is also about refining the knowledge gained in this first week.

Overview of Bonus Project 1

The samples and exercises of the lessons nearly all focus on one aspect of XSLT: an element, an attribute, or a function. Looking at XSLT this way is very important because you are not distracted by other functionality. The downside to this approach is that you don’t really get an idea of how all the functionality interacts, as is the case in a real-world application. Each week concludes with a bonus project that shows you how to create a real-world application from beginning to end. These projects will focus on what you have learned in the past week.

Creating an Article with a Table of Contents

On Day 1, “Getting Started with XSLT,” I told you that one of the advantages of XML is that it can work with relatively unstructured data such as articles in a paper or magazine or on a Web site. When you have an article in XML, you can easily convert it to other output types, satisfying multiple needs. Bonus Project 1 is about creating a stylesheet for one such output—in this case, HTML. The aim of this project is to use as many of the elements and functions you have learned about in the past week as possible. Therefore, the XSLT document that results from this project will contain different solutions for similar problems. If this were a real-world project, you would probably try to use the same solution for the same types of problems because that would make the XSLT document easier to read and understand. This project, however, is a learning tool with the purpose of showing you the differences and similarities of different approaches. Let’s get started!

Project Overview

In this project, two files, the source XML and the stylesheet used to process the source XML. You can create additional XML sources by using the same XML structure used for the source XML created in this project. The stylesheet is used to transform the XML source to HTML.

The resulting HTML should provide the article in a readable fashion to the reader. This means that you will have to create some kind of layout with headers, text, and possibly effects to mark special words. The article is also divided in sections with headers. At the start of the article, a table of contents should show all the article headers. To make the article easy to navigate, you will make these headers into hyperlinks linking to the anchor inserted at the point where the section starts. Figure BP1.1 gives you an idea of what this article will look like.

Figure BP1.1 Article marked up in

Image

In Figure BP1.1, the article header is followed by information on when and by whom the article was written. Then it has some brief introductory text, an abstract, and a table of contents with headers as links. The article text starts below the horizontal line, with each section starting with a header. (In Figure BP1.1, you can see only the start of the first section.)

In this project, you will re-create the exact HTML resulting in the article in Figure BP1.1.

Creating the Article XML

Before you can create a stylesheet, you first need to have an XML structure to create the article or articles in. When you’re developing this structure, it is important that the structure formed by the elements is logical. The relationship between the elements and what the elements represent should be clear. It is therefore also important that the names you choose for elements and attributes are representative of the information they contain. So, if you want to create an XML document containing an article, you would be wise to name the root element article and store the name or names of the person or persons who wrote it in an element called author. What is not a good idea is to name these elements a1 and a2, or ar and au, because these element names may make sense to you but not to anybody else. You can use short names only when there is some kind of convention, such as using the name para for paragraph elements. Listing BP1.1 shows a small article with elements (and attributes) that you would typically find in an article.

Listing BP1.1 Article Tagged in XML

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <article id="xml0001" date="03/01/2001">
3:    <title>What&apos;s the deal with XML?</title>
4:    <author firstname="Michiel" lastname="van Otegem" />
5:    <intro>It has been a few years since <term>XML</term> was announced as
6:           the technology that would change the web. With 99% of the
7:           websites still using <term>HTML</term>, that statement seems to
8:           have been somewhat optimistic, or has it?
9:    </intro>
10:   <body>
11:     <para header="XML for the Web?">
12:        The idea was that <term>XML</term> would quickly conquer the web.
13:        With most websites still using <term>HTML</term> and only the newer
14:        <device>browsers</device> supporting XML, this is obviously yet to
15:        happen. If it will happen is dependent on if users are willing to
16:        update their current <device>browsers</device> to one supporting
17:        <term>XML</term>, <term>XSLT</term> and related technologies.
18:        Another factor is the ability of <term>HTML</term> developers to
19:        switch to <term>XML</term>/<term>XSLT</term> development. This last
20:        step has proven difficult and without good software to aid the
21:        developers may prove problematic.

22:      </para>
23:      <para header="Is XML a failure?">
24:        <term>XML</term> is most certainly not a failure. The fact that it
25:        hasn&apos;t caught on the front-end doesn&apos;t mean that it
26:        hasn&apos;t caught on at all. <term>XML</term> is very popular as a
27:        means of communicating data between systems and applications.
28:      </para>
29:      <para>
30:        Because <term>XML</term> is a standard way of communicating data
31:        and is capable of representing many data models, it is a natural
32:        choice when communication is needed between systems. Because
33:        <term>XML</term>is a string format any operating system can read
34:        it.
35:      </para>
36:   </body>
37: </article>

Note

You can download the sample listings in this project from the publisher’s Web site.

ANALYSIS

The article in Listing BP1.1 is represented inside a root element called article. As you can see on line 2, this element has two attributes: the id attribute containing a unique identifier for this article and the date attribute containing the date when it was written. The id attribute is somewhat superfluous when you’re storing articles as separate documents but may prove handy when you want to merge files into a larger document. The article element has several child elements with different functions, or rather with different types of data. The title and author elements on lines 3 and 4 are fairly obvious. The title element could just as well have been an attribute of the article element, but whether that is a good idea is arbitrary. There could, however, be more than one author, in which case you would need multiple elements, so having author as an attribute is not possible. The firstname and lastname attributes of the author element speak for themselves.

The intro element on lines 5 through 9 contains introductory text or an abstract about the article. Note that it can contain term elements to denote terms used in the text. These term elements are also used in the text that is stored in para elements, of which there are several on lines 11 through 36. They are child elements of the body element, which is included to denote the article’s body text. A para element can have a header attribute, as shown on lines 11 and 23, but it is not mandatory. Such a header attribute contains the header for the para element it comes with and any following para elements that don’t have a header attribute. Hence, the header attribute on line 11 serves as a header for the para element on line 11, and the header attribute on line 23 serves as a header for the para elements on both lines 23 and 31.

Note

The elements used in Listing BP1.1 serve as examples. You could easily extend the sample with more elements and attributes to suit your needs.

Creating the XSLT Document

Creating a stylesheet is best done step by step, slowly building the desired output. This process, of course, always starts with an empty stylesheet, preferably with an XML declaration. The next step is to create a template that matches any element. This template should be empty so that it creates no output. The purpose of this template is to override any built-in rules so that you don’t get any unexpected output from elements you didn’t match. You also need to define the output method and, if needed, any xsl:strip-space and xsl:preserve-space elements. Because you’re creating HTML, whitespace handling is, strictly speaking, not necessary, but in this project the article and body elements will be stripped of space. The result of the efforts so far is shown in Listing BP1.2.

Listing BP1.2 Stylesheet with Basic Elements

<?xml version=″“1.0”" encoding="UTF-8"?>
<xsl:stylesheet version=″“1.0”"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="html" version="4.0" encoding="UTF-8" />
  <xsl:strip-space elements="article body" />

  <xsl:template match="*" />
</xsl:stylesheet>

Note

In this project, each code listing expands on Listing BP1.2. The code listings you can obtain from the publisher’s Web site contain a complete stylesheet that you can run, so you can see the differences between using and not using certain elements.

ANALYSIS

The stylesheet in Listing BP1.2 will generate no output. The template matching any element is invoked for the article element in the source, and that’s that. Processing stops and no output is created. Generating output is the task of other templates to be added. The purpose of this stylesheet is to make sure you are not faced with any unexpected and unwanted output.

Note

In Listing BP1.2, match="*" matches any element. Because you are dealing only with elements here, this works fine. In cases in which matching is invoked for attributes and text, match="/ | * | @* | text ()" is an alternative that is often used. The latter expression matches the root element, any element, any attributes, and text nodes.

Creating the HTML Base

You’re creating HTML, so the output should contain the basic elements needed in HTML. The best place to insert them is a template matching the source document’s root. After all, these elements are themselves root elements, but for an HTML document. Listing BP1.3 shows the root template.

Listing BP1.3 Root Template Creating HTML Elements

<xsl:template match="/">
    <html>
       <body>
           <xsl:apply-templates />
       </body>
    </html>
</xsl:template>

ANALYSIS

When the stylesheet encounters the root node, the template in Listing BP1.3 is matched and the html and body elements inserted. The xsl:apply-templates element makes sure that processing continues. At this point, this element will match the wildcard template in Listing BP1.2, so processing will stop there. This means that, for now, only the elements inserted here are part of the output.

Showing the Article Information

Now it’s time to really create some output from the source document. Because the result should start with the article title and information on who wrote it and when, this is what you should concentrate on first. Because this information originates in the article element, getting it first is not a problem because it is the first element to be matched after the document root. A template matching the article element will therefore be matched when xsl:apply-templates is executed in Listing BP1.3. Listing BP1.4 shows the partial stylesheet responsible for the next phase of the output.

Listing BP1.4 Partial Stylesheet Creating the Article

1:  <xsl:template match="article">
2:    <xsl:call-template name="info" />
3:    <h2>Abstract</h2>
4:    <p><xsl:apply-templates select="intro" /></p>
5:     <h2>Table Of Contents</h2>
6:    <xsl:apply-templates mode="toc" />
7:     <hr />
8:    <xsl:apply-templates mode="body" />
9:  </xsl:template>
10:
11: <xsl:template name="info">
12:   <h1><xsl:value-of select="title" /></h1>
13:   <p>
14:     Written <xsl:value-of select="@date" />
15:     by <i>
16:       <xsl:value-of select="author/@firstname" />
17:        <xsl:text> </xsl:text>
18:       <xsl:value-of select="author/@lastname" />
19:      </i>
20:   </p>
21: </xsl:template>
</p>

ANALYSIS

Listing BP1.4 may look a little complex, but it is not hard to understand. First, the template matching the article element consists of some output that is created inline, but also using both named and matched templates. On line 2, a named template is called to insert the article information. When you use a named template, the code is divided in more understandable pieces. The called template that starts on line 11 inserts the title of the article on line 12 and then proceeds to add the date and author. Everything is surrounded by HTML tags for layout. Note that on lines 16 and 18 the value that is selected doesn’t come from the context node or a direct child node, but from attributes of a child element. Also, note that on line 17 an xsl:text element inserts whitespace. If that whitespace weren’t inserted, the author’s first name and last name would be inserted one after the other without any spaces in between.

Note

In this stylesheet, the author’s name is inserted by referencing the author element. This means that if there is more than one author, only the first in the source document is inserted. If you want to have all of them, you need to use xsl:for-each or additional templates, and add some kind of delimiter. Adding this functionality is a good exercise after you finish Bonus Project 1.

After the template dealing with the article information is called, the template matching the article element continues to insert HTML elements. Additionally, on line 4 processing continues specifically on the intro element. On lines 6 and 8, processing continues, but in two separate modes, one for the table of contents (TOC) and one for the actual article text. The two separate modes are used because the para elements need to be processed once for the TOC and once for the article text.

Note

Instead of applying templates in two different modes, you could also use xsl:for-each to select the para elements that have a header attribute and output them. Then, when the article text needs to be created, you can use xsl:apply-templates without any mode.

Because you’re using templates with modes, you’re inserting a new factor that can cause side effects, such as inserted text, because an element isn’t matched and the default template rule takes over. To make sure that you don’t have any side effects, you need to add empty templates for those modes, just like you did for the default mode. These templates are shown in Listing BP1.5.

Listing BP1.5 Templates Preventing Side Effects

<xsl:template match="*" mode="toc" />
<xsl:template match="*" mode="body" />

Inserting the Abstract

In Listing BP1.41, you could see that the intro element was specifically selected when applying new templates. To have it inserted into the output, you need to add templates that actually add it. Listing BP1.6 shows the code that inserts this element.

Listing BP1.6 Code Responsible for Inserting the Abstract

1:  <xsl:template match="intro">
2:    <xsl:apply-templates />
3:  </xsl:template>
4:
5:  <xsl:template match="term">
6:    <u><xsl:value-of select="." /></u>
7:  </xsl:template>
8:
9:  <xsl:template match="text ()">
10:   <xsl:value-of select="." />
11: </xsl:template>

ANALYSIS

The template on line 1 that matched the intro attribute does nothing more than re-invoke the processor for its child elements. Because the value of the intro element contains both text and elements, both the child elements and text nodes need to be matched. In the sample article, the only child elements are term elements, which are matched by the template on line 5. This template inserts the value of those elements and underlines them using HTML underline tags. The template on line 9 matches any text and just inserts it into the output.

Note

Because the output is HTML, stripping whitespace is not necessary. If you stripped whitespace from the text nodes using the normalize-space () function, you would inadvertently remove significant whitespace.

Inserting the Table of Contents

The next step in the output is the table of contents (TOC). On line 6 of Listing BP1.4, you saw that a separate mode was invoked to create the TOC. This means that the templates you create to insert the TOC need to use this mode; otherwise, it will not appear in the output. Because there is already a template dealing with any element that is not matched by a specific template in this mode, you need to add templates only for the elements that need to be matched. They are any child nodes of the context element, which is the article element. However, for the TOC, you need to match only the body element and its para child elements, as shown in Listing BP1.7.

Listing BP1.7 Template Creating the Table of Contents

1:  <xsl:template match="body" mode="toc">
2:    <xsl:apply-templates mode="toc" />
3:  </xsl:template>
4:
5:  <xsl:template match="para" mode="toc">
6:     <xsl:if test="@header">
7:       <a href="#{@header}">
8:        <xsl:value-of select="@header" />
9:      </a><br /><xsl:text>&#xA;</xsl:text>
10:   </xsl:if>
11: </xsl:template>

ANALYSIS

The template on line 1 in Listing BP1.7 matches the body element. It invokes the processor again only to match its child elements. You may think that because of the built-in template rule, it is not necessary, as the child elements will be processed by the built-in rule if the body element is not matched. The template added in Listing BP1.5 also matches the body element, however, and would stop any further processing.

The template on line 5 matches the para elements. The code in that template is a little more elaborate because it needs to filter out any unwanted paragraphs. The only paragraphs for which something needs to be inserted into the TOC are those that have header attributes. Line 6 tests whether the current para element has a header attribute. If there is no attribute, the code inside the xsl:if element is not executed. If there is a header attribute, a hyperlink is inserted. The value of this hyperlink is, of course, the header itself; the URL to which the hyperlink points actually points to an HTML anchor in the same document. This is achieved with line 7, which creates an a element with an href attribute. The value of the attribute is part static and part dynamic. The dynamic part is enclosed in curly braces and inserts the value of the header attribute.

What the output looks like in a browser will not be influenced by what the actual source output will look like. To make that source is a little more readable to somebody requesting it, however, a linefeed is inserted on line 9.

Inserting the Article Body

Last but not least, the article body should be inserted into the output. This is, after all, what the entire output is about—no article body…nothing to read. The article body again uses a mode to separate it from other types of sections that need to be inserted into the output. The code is shown in Listing BP1.8.

Listing BP1.8 Template Inserting the Article Body

1:  <xsl:template match="body" mode="body">
2:    <xsl:for-each select="para">
3:      <xsl:if test="@header">
4:        <a name="{@header}">
5:          <h3><xsl:value-of select="@header" /></h3>
6:        </a>
7:        <xsl:text>&#xA;</xsl:text>
8:      </xsl:if>
9:      <p><xsl:apply-templates /></p>
10:   </xsl:for-each>
11: </xsl:template>

ANALYSIS

The body element contains only (para) child elements, so you would expect the template in Listing BP1.8 to invoke the processor to match its child elements. Instead, I chose to use an xsl:for-each element that selects and processes each para element. The code inside the xsl:for-each element looks a lot like the code use earlier for the TOC items. Instead of creating a hyperlink, however, line 4 now creates an HTML anchor. Additionally, it doesn’t matter whether the para element contains a header attribute. When it’s time to insert the contents of the para element, xsl:apply-templates is used on line 9 to continue processing. The contents of the para element are similar to that of the intro element discussed earlier, which is why on line 9 no mode is used to match templates. The result is that the templates in Listing BP1.6 are matched, reusing that code.

If you now apply the entire stylesheet as shown in Listing BP1.9 to Listing BP1.1, you get the output that results in Figure BP1.1. Try it. As I noted earlier, many of the tasks performed in the stylesheet can be achieved in several other ways. It is a good exercise to try to replace some of the code with different code doing the same thing.

Listing BP1.9 Complete Stylesheet for Bonus Project 1

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="html" version="4.0" encoding="UTF-8" 
6:     <xsl:strip-space elements="article body" />@!
7:
8:    <xsl:template match="/">
9:       <html>
10:        <body>
11:         <xsl:apply-templates />
12:       </body>
13:     </html>
14:    </xsl:template>
15:
16:   <xsl:template match="article">
17:     <xsl:call-template name="info" />
18:     <h2>Abstract</h2>
19:     <p><xsl:apply-templates select="intro" /></p>
20:      <h2>Table Of Contents</h2>
21:     <xsl:apply-templates mode="toc" />
22:      <hr />
23:     <xsl:apply-templates mode="body" />
24:    </xsl:template>
25:
26:   <xsl:template name="info">
27:     <h1><xsl:value-of select="title" /></h1>
28:     <p>
29:        Written <xsl:value-of select="@date" />
30:        by <i>
31:          <xsl:value-of select="author/@firstname" />
32:          <xsl:text> </xsl:text>
33:         <xsl:value-of select="author/@lastname" />
34:        </i>
35:     </p>
36:    </xsl:template>
37:
38:   <xsl:template match="intro">
39:      <xsl:apply-templates />
40:   </xsl:template>
41:
42:    <xsl:template match="term">
43:      <u><xsl:value-of select="." /></u>
44:   </xsl:template>
45:
46:   <xsl:template match="text ()">
47:       <xsl:value-of select="." />
48:   </xsl:template>
49:
50:    <xsl:template match="body" mode="toc">
51:      <xsl:apply-templates mode="toc" />
52:   </xsl:template>
53:
54:    <xsl:template match="para" mode="toc">
55:      <xsl:if test="@header">
56:        <a href="#{@header}">
57:          <xsl:value-of select="@header" />
58:       </a><br /><xsl:text>&#xA;</xsl:text>
59:     </xsl:if>
60:   </xsl:template>
61:
62:    <xsl:template match="body" mode="body">
63:      <xsl:for-each select="para">
64:        <xsl:if test="@header">
65:          <a name="{@header}">
66:           <h3><xsl:value-of select="@header" /></h3>
67:         </a>
68:         <xsl:text>&#xA;</xsl:text>
69:       </xsl:if>
70:       <p><xsl:apply-templates /></p>
71:     </xsl:for-each>
72:   </xsl:template>
73:
74:    <xsl:template match="*" />
75:   <xsl:template match="*" mode="toc" />
76:    <xsl:template match="*" mode="body" />
77: </xsl:stylesheet>

Listing BP1.10 shows the result from applying Listing BP1.9 to Listing BP1.1.

Note

For your understanding of each section of code added, you should execute the code and look at the intermediate results as well.

OUTPUT

Listing BP1.10 Result When Listing BP1.9 Is Applied to Listing BP1.1

<html>
      <body>
      <h1>What's the deal with XML?</h1>
      <p>
        Written 03/01/2001
        by <i>Michiel van Otegem</i></p>
      <h2>Abstract</h2>
      <p>It has been a few years since <u>XML</u> was announced as the
        technology that would change the web. With 99% of the websites still
        using <u>HTML</u>, that statement seems to have been somewhat
        optimistic, or has it?

      </p>
      <h2>Table Of Contents</h2><a href="#XML for the Web?">XML for the Web?</a><br>
      <a href="#Is XML a failure?">Is XML a failure?</a><br>

      <hr><a name="XML for the Web?">
        <h3>XML for the Web?</h3></a>

      <p>
        The idea was that <u>XML</u> would quickly conquer the web. With
        most websites still using <u>HTML</u> and only the newer
        supporting XML, this is obviously yet to
        happen. If it will happen is dependent on if users are willing to
        update their current  to one supporting
        <u>XML</u>, <u>XSLT</u> and related technologies. Another
        factor is the ability of <u>HTML</u> developers to switch to
        <u>XML</u>/<u>XSLT</u> development. This last step has
        proven difficult and without good software to aid the developers may
        prove problematic.

      </p><a name="Is XML a failure?">
        <h3>Is XML a failure?</h3></a>

      <p>
        <u>XML</u> is most certainly not a failure. The fact that it
        hasn't caught on the front-end doesn't mean that it
        hasn't caught on at all. <u>XML</u> is very popular as a
        means of communicating data between systems and applications.

      </p>
      <p>
        Because <u>XML</u> is a standard way of communicating data
        and is capable of representing many data models, it is a natural
        choice when communication is needed between systems. Because
        <u>XML</u> is a string format any operating system can read
        it.
      </p>   
   </body>
</html>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.104.124