In this first week, you have covered a lot of ground, from learning what XSLT is to developing the foundation needed to create complex stylesheets. That foundation is actually so complete that you can already create stylesheets with many different features. This capability is one of the things that makes XSLT so interesting because you can achieve a great deal with only a few elements and functions. In the next two weeks, you will learn about even more elements and functions, but much of that information is also about refining the knowledge gained in this first week.
The samples and exercises of the lessons nearly all focus on one aspect of XSLT: an element, an attribute, or a function. Looking at XSLT this way is very important because you are not distracted by other functionality. The downside to this approach is that you don’t really get an idea of how all the functionality interacts, as is the case in a real-world application. Each week concludes with a bonus project that shows you how to create a real-world application from beginning to end. These projects will focus on what you have learned in the past week.
On Day 1, “Getting Started with XSLT,” I told you that one of the advantages of XML is that it can work with relatively unstructured data such as articles in a paper or magazine or on a Web site. When you have an article in XML, you can easily convert it to other output types, satisfying multiple needs. Bonus Project 1 is about creating a stylesheet for one such output—in this case, HTML. The aim of this project is to use as many of the elements and functions you have learned about in the past week as possible. Therefore, the XSLT document that results from this project will contain different solutions for similar problems. If this were a real-world project, you would probably try to use the same solution for the same types of problems because that would make the XSLT document easier to read and understand. This project, however, is a learning tool with the purpose of showing you the differences and similarities of different approaches. Let’s get started!
In this project, two files, the source XML and the stylesheet used to process the source XML. You can create additional XML sources by using the same XML structure used for the source XML created in this project. The stylesheet is used to transform the XML source to HTML.
The resulting HTML should provide the article in a readable fashion to the reader. This means that you will have to create some kind of layout with headers, text, and possibly effects to mark special words. The article is also divided in sections with headers. At the start of the article, a table of contents should show all the article headers. To make the article easy to navigate, you will make these headers into hyperlinks linking to the anchor inserted at the point where the section starts. Figure BP1.1 gives you an idea of what this article will look like.
In Figure BP1.1, the article header is followed by information on when and by whom the article was written. Then it has some brief introductory text, an abstract, and a table of contents with headers as links. The article text starts below the horizontal line, with each section starting with a header. (In Figure BP1.1, you can see only the start of the first section.)
In this project, you will re-create the exact HTML resulting in the article in Figure BP1.1.
Before you can create a stylesheet, you first need to have an XML structure to create the article or articles in. When you’re developing this structure, it is important that the structure formed by the elements is logical. The relationship between the elements and what the elements represent should be clear. It is therefore also important that the names you choose for elements and attributes are representative of the information they contain. So, if you want to create an XML document containing an article, you would be wise to name the root element article
and store the name or names of the person or persons who wrote it in an element called author
. What is not a good idea is to name these elements a1
and a2
, or ar
and au
, because these element names may make sense to you but not to anybody else. You can use short names only when there is some kind of convention, such as using the name para
for paragraph elements. Listing BP1.1 shows a small article with elements (and attributes) that you would typically find in an article.
Note
You can download the sample listings in this project from the publisher’s Web site.
ANALYSIS
The article in Listing BP1.1 is represented inside a root element called article
. As you can see on line 2, this element has two attributes: the id
attribute containing a unique identifier for this article and the date
attribute containing the date when it was written. The id
attribute is somewhat superfluous when you’re storing articles as separate documents but may prove handy when you want to merge files into a larger document. The article
element has several child elements with different functions, or rather with different types of data. The title
and author
elements on lines 3 and 4 are fairly obvious. The title
element could just as well have been an attribute of the article
element, but whether that is a good idea is arbitrary. There could, however, be more than one author, in which case you would need multiple elements, so having author
as an attribute is not possible. The firstname
and lastname
attributes of the author
element speak for themselves.
The intro
element on lines 5 through 9 contains introductory text or an abstract about the article. Note that it can contain term
elements to denote terms used in the text. These term
elements are also used in the text that is stored in para
elements, of which there are several on lines 11 through 36. They are child elements of the body
element, which is included to denote the article’s body text. A para
element can have a header
attribute, as shown on lines 11 and 23, but it is not mandatory. Such a header
attribute contains the header for the para
element it comes with and any following para
elements that don’t have a header
attribute. Hence, the header
attribute on line 11 serves as a header for the para
element on line 11, and the header
attribute on line 23 serves as a header for the para
elements on both lines 23 and 31.
Note
The elements used in Listing BP1.1 serve as examples. You could easily extend the sample with more elements and attributes to suit your needs.
Creating a stylesheet is best done step by step, slowly building the desired output. This process, of course, always starts with an empty stylesheet, preferably with an XML declaration. The next step is to create a template that matches any element. This template should be empty so that it creates no output. The purpose of this template is to override any built-in rules so that you don’t get any unexpected output from elements you didn’t match. You also need to define the output method and, if needed, any xsl:strip-space
and xsl:preserve-space
elements. Because you’re creating HTML, whitespace handling is, strictly speaking, not necessary, but in this project the article
and body
elements will be stripped of space. The result of the efforts so far is shown in Listing BP1.2.
Note
In this project, each code listing expands on Listing BP1.2. The code listings you can obtain from the publisher’s Web site contain a complete stylesheet that you can run, so you can see the differences between using and not using certain elements.
ANALYSIS
The stylesheet in Listing BP1.2 will generate no output. The template matching any element is invoked for the article
element in the source, and that’s that. Processing stops and no output is created. Generating output is the task of other templates to be added. The purpose of this stylesheet is to make sure you are not faced with any unexpected and unwanted output.
Note
In Listing BP1.2, match="*"
matches any element. Because you are dealing only with elements here, this works fine. In cases in which matching is invoked for attributes and text, match="/ | * | @* | text ()"
is an alternative that is often used. The latter expression matches the root element, any element, any attributes, and text nodes.
You’re creating HTML, so the output should contain the basic elements needed in HTML. The best place to insert them is a template matching the source document’s root. After all, these elements are themselves root elements, but for an HTML document. Listing BP1.3 shows the root template.
ANALYSIS
When the stylesheet encounters the root node, the template in Listing BP1.3 is matched and the html
and body
elements inserted. The xsl:apply-templates
element makes sure that processing continues. At this point, this element will match the wildcard template in Listing BP1.2, so processing will stop there. This means that, for now, only the elements inserted here are part of the output.
Now it’s time to really create some output from the source document. Because the result should start with the article title and information on who wrote it and when, this is what you should concentrate on first. Because this information originates in the article
element, getting it first is not a problem because it is the first element to be matched after the document root. A template matching the article
element will therefore be matched when xsl:apply-templates
is executed in Listing BP1.3. Listing BP1.4 shows the partial stylesheet responsible for the next phase of the output.
ANALYSIS
Listing BP1.4 may look a little complex, but it is not hard to understand. First, the template matching the article
element consists of some output that is created inline, but also using both named and matched templates. On line 2, a named template is called to insert the article information. When you use a named template, the code is divided in more understandable pieces. The called template that starts on line 11 inserts the title of the article on line 12 and then proceeds to add the date and author. Everything is surrounded by HTML tags for layout. Note that on lines 16 and 18 the value that is selected doesn’t come from the context node or a direct child node, but from attributes of a child element. Also, note that on line 17 an xsl:text
element inserts whitespace. If that whitespace weren’t inserted, the author’s first name and last name would be inserted one after the other without any spaces in between.
Note
In this stylesheet, the author’s name is inserted by referencing the author
element. This means that if there is more than one author, only the first in the source document is inserted. If you want to have all of them, you need to use xsl:for-each
or additional templates, and add some kind of delimiter. Adding this functionality is a good exercise after you finish Bonus Project 1.
After the template dealing with the article information is called, the template matching the article
element continues to insert HTML elements. Additionally, on line 4 processing continues specifically on the intro
element. On lines 6 and 8, processing continues, but in two separate modes, one for the table of contents (TOC) and one for the actual article text. The two separate modes are used because the para
elements need to be processed once for the TOC and once for the article text.
Note
Instead of applying templates in two different modes, you could also use xsl:for-each
to select the para
elements that have a header
attribute and output them. Then, when the article text needs to be created, you can use xsl:apply-templates
without any mode.
Because you’re using templates with modes, you’re inserting a new factor that can cause side effects, such as inserted text, because an element isn’t matched and the default template rule takes over. To make sure that you don’t have any side effects, you need to add empty templates for those modes, just like you did for the default mode. These templates are shown in Listing BP1.5.
In Listing BP1.41, you could see that the intro
element was specifically selected when applying new templates. To have it inserted into the output, you need to add templates that actually add it. Listing BP1.6 shows the code that inserts this element.
The template on line 1 that matched the intro
attribute does nothing more than re-invoke the processor for its child elements. Because the value of the intro
element contains both text and elements, both the child elements and text nodes need to be matched. In the sample article, the only child elements are term
elements, which are matched by the template on line 5. This template inserts the value of those elements and underlines them using HTML underline tags. The template on line 9 matches any text and just inserts it into the output.
Note
Because the output is HTML, stripping whitespace is not necessary. If you stripped whitespace from the text nodes using the normalize-space ()
function, you would inadvertently remove significant whitespace.
The next step in the output is the table of contents (TOC). On line 6 of Listing BP1.4, you saw that a separate mode was invoked to create the TOC. This means that the templates you create to insert the TOC need to use this mode; otherwise, it will not appear in the output. Because there is already a template dealing with any element that is not matched by a specific template in this mode, you need to add templates only for the elements that need to be matched. They are any child nodes of the context element, which is the article
element. However, for the TOC, you need to match only the body
element and its para
child elements, as shown in Listing BP1.7.
ANALYSIS
The template on line 1 in Listing BP1.7 matches the body
element. It invokes the processor again only to match its child elements. You may think that because of the built-in template rule, it is not necessary, as the child elements will be processed by the built-in rule if the body
element is not matched. The template added in Listing BP1.5 also matches the body
element, however, and would stop any further processing.
The template on line 5 matches the para
elements. The code in that template is a little more elaborate because it needs to filter out any unwanted paragraphs. The only paragraphs for which something needs to be inserted into the TOC are those that have header
attributes. Line 6 tests whether the current para
element has a header
attribute. If there is no attribute, the code inside the xsl:if
element is not executed. If there is a header
attribute, a hyperlink is inserted. The value of this hyperlink is, of course, the header itself; the URL to which the hyperlink points actually points to an HTML anchor in the same document. This is achieved with line 7, which creates an a
element with an href
attribute. The value of the attribute is part static and part dynamic. The dynamic part is enclosed in curly braces and inserts the value of the header
attribute.
What the output looks like in a browser will not be influenced by what the actual source output will look like. To make that source is a little more readable to somebody requesting it, however, a linefeed is inserted on line 9.
Last but not least, the article body should be inserted into the output. This is, after all, what the entire output is about—no article body…nothing to read. The article body again uses a mode to separate it from other types of sections that need to be inserted into the output. The code is shown in Listing BP1.8.
ANALYSIS
The body
element contains only (para
) child elements, so you would expect the template in Listing BP1.8 to invoke the processor to match its child elements. Instead, I chose to use an xsl:for-each
element that selects and processes each para
element. The code inside the xsl:for-each
element looks a lot like the code use earlier for the TOC items. Instead of creating a hyperlink, however, line 4 now creates an HTML anchor. Additionally, it doesn’t matter whether the para
element contains a header
attribute. When it’s time to insert the contents of the para
element, xsl:apply-templates
is used on line 9 to continue processing. The contents of the para
element are similar to that of the intro
element discussed earlier, which is why on line 9 no mode is used to match templates. The result is that the templates in Listing BP1.6 are matched, reusing that code.
If you now apply the entire stylesheet as shown in Listing BP1.9 to Listing BP1.1, you get the output that results in Figure BP1.1. Try it. As I noted earlier, many of the tasks performed in the stylesheet can be achieved in several other ways. It is a good exercise to try to replace some of the code with different code doing the same thing.
Listing BP1.10 shows the result from applying Listing BP1.9 to Listing BP1.1.
Note
For your understanding of each section of code added, you should execute the code and look at the intermediate results as well.
52.15.129.90