In yesterday’s lesson, you learned how to manipulate string and number data values so that you can display values the way you want to. Until yesterday’s lesson, you could work only with the entire element or attribute value, but with the functions discussed yesterday, you now can work with even part of a value.
Today’s lesson is about sorting values in a node-set so that they are displayed in a certain order. In addition, you will learn how to add different types of numbering so that you can make nicely numbered lists. This capability is mostly useful for documents with chapters, sections, or paragraphs.
Today you will learn how to do the following:
• Sort on specific fields
• Sort in a different order
• Sort dynamically using a parameter
• Use numbering
• Create numbering in different formats
When you select a node-set and display the results using a match template or xsl:for-each
, the nodes are always in document order. For unstructured documents, this order is not such a problem, but when you have data that is structured, such as a set of names and addresses, sorting the data before displaying it may be very important. Fortunately, XSLT provides good support for sorting, although you need to be aware of some minor pitfalls.
Sorting basically comes in two flavors: static and dynamic. With static sorting, you know the sort order at design time, so you can create the XSLT to sort on the element or attribute you want to and in the order you want to. With dynamic sorting, you don’t know the sort order at design time, but rather at runtime. This means that you need to have some way to specify the sort element or attribute and the order. Specifying the order is relatively straightforward. Specifying the element or attribute in which to sort, however, is somewhat awkward.
You can specify a sort order by using the xsl:sort
element, which can be used in conjunction with xsl:for-each
or xsl:apply-templates
. The xsl:for-each
element is easier to use because you have a better idea of the node-set that you are actually working with; you are pulling in the node-set rather than matching nodes.
xsl:for-each
If you use xsl:sort
elements inside an xsl:for-each
element, you need to insert these elements before any other element. You can insert an element for each value you want to sort on, with the first element being used first, then the second, and so on. This way, if the first value is the same for two nodes in a node-set, the second determines which node comes first and so on. Finally, if all values sorted on are the same, the elements are sorted in document order.
The xsl:sort
element has several attributes. The most important one is select
, which holds the sort key. Although this is a select
attribute in that it can hold an expression to specify a certain sort key, this sort key needs to have bearing on the node-set that is sorted. Otherwise, it does nothing at all, and the node-set is still sorted in document order. The select
attribute is optional, but for clarity, it is a good idea to always use this attribute. If the select
attribute is not specified, select="."
is implied, so it operates on the value of each node in the node-set being sorted. The other attributes are also optional and will be discussed later. Now let’s look at an example that operates on Listing 12.1.
Listing 12.1 shows a more or less familiar sight. Although you have not seen this listing like this, you’re familiar with its structure. It just has some extra values. Listing 12.2 shows the stylesheet that will be used to sort Listing 12.1.
ANALYSIS
Listing 12.2 is rather straightforward. One template matches the document root. Then an xsl:for-each
element selects and loops through all car
elements that are child nodes of the cars
element (that is, all of them). Next, the xsl:sort
elements on lines 9 and 10 tell the processor that the node-set selected by the xsl:for-each
element is to be sorted on the manufacturer
attribute and then on the model
attribute. All subsequent elements within the xsl:for-each
element are then used to display the nodes.
As you can see, the select
attribute of the xsl:sort
element operates on the context of the nodes in the node-set. That’s why @manufacturer
does not operate on cars
but rather on the nodes that are car
elements.
Note
Sorting is performed after the node-set is constructed. This means you can sort a node-set that consists of elements scattered throughout the source document. This is also why you can sort from within an xsl:for-each
element.
You can see the result, which is a neat list, in Listing 12.3.
OUTPUT
Sorting in Descending Order
The xsl:sort
element has an attribute named order
, which determines the sort order of the sort key. The value of this attribute can be either ascending
or descending
, with ascending
being the default, as you may have noticed from the earlier sample. Listing 12.4 shows the xsl:sort
elements from Listing 12.2 with the sort order changed.
ANALYSIS
In Listing 12.4, only the first element has a sort order. This means that the model
attribute sort key still defaults to ascending
instead of descending
. In other words, the second xsl:sort
element does not take its default sort order from the element immediately above it, but rather redefines it itself. Listing 12.5 shows the output when the sort keys in Listing 12.2 are changed to those in Listing 12.4.
OUTPUT
ANALYSIS
In Listing 12.5, the output is now in reverse order of manufacturers. The car models, however, are still in ascending order.
NEW TERM
Normally, a node-set is sorted in document order, which is the order in which the nodes in the node-set appear in the source document. You also can order a node-set in reverse document order by using the following sort expression:
<xsl:sort select="position ()" order="descending" />
Changing Ordering Rules
So far, you’ve had no surprises as to how the nodes were sorted. In the preceding samples, all the values were capitalized, and hence sorted correctly. However, you also need to be aware of rules that determine what happens when lowercase and uppercase letters are mixed. For instance, does Focus
precede focus
, or the other way around? Unfortunately, the order depends on the default language, which in turn depends on the platform. This means that unless you specify these rules explicitly, the same stylesheet may produce different results on different computers.
Caution
The XSLT specification indicates that the results of processors may differ because of their implementation and platform. So, instead of defining a common standard, XSLT allows for this inconsistency. This means that if you don’t want any surprises, you have to deal with this issue in your stylesheet.
Two attributes have influence on the sorting rules: case-order
and lang
. The first attribute defines whether uppercase comes before lowercase, or vice versa. The second attribute defines the language settings. The former overrides the case settings of the latter, but some more rules go with the language setting. The language setting, for instance, determines whether ä is treated as a specific character to be placed after z
, or if it is treated as a special case of a
, in which case it just comes before b
. Because this order depends on the language itself, it is hard to say what happens for each language (there are just too many). The best way to find out what happens in the languages you want to target is to try them on different processors. Because not even dictionaries of one language are consistent, processors are likely to be inconsistent as well.
The case-order
attribute can have two values: upper-first
or lower-first
. As I said previously, the default depends on the current language, so unless you specify it, you are not sure which will be used. The following code makes sure that uppercase letters are treated first:
<xsl:sort select="@model" case-order="upper-first" />
When you use the preceding code, Ford
comes before ford
.
Caution
The value upper-first
does not mean that all uppercase letters come first and then all lowercase letters, so everything is not sorted like this: ABCabc
. It means that the uppercase version of the same letter comes before the lowercase version of that letter, so everything is sorted like this: AaBbCc
.
The lower-first
value, of course, means the opposite of upper-first
. If you specify another (nonexisting) value, it is ignored and the default is used instead.
When you use the lang
attribute, any value that yields a valid language has an impact on the sort order for specific letters. Valid values for the lang
attribute are shown in Table 12.1.
Table 12.1 is more than complete when it comes to sorting because it contains values that are of no use for sorting. There are other attributes, such as xml:lang
, for which all the values in Table 12.1 are relevant. The significance of user-defined languages is dubious for sorting because the processor has no way of knowing the language settings for user-defined languages, unless they are formally described to the processor. That is obviously very unlikely.
If you want to make sure that the nodes are sorted according to U.S. English rules, your code should look as follows:
<xsl:sort select="@model" lang="en-US" />
The actual sort order depends on the implementation of the processor. The only way to find out that order is to test on a processor-by-processor basis.
Sorting on a Different Data Type
The last attribute of the xsl:sort
element is data-type
, which specifies the data type used to sort on. By default, this data type is text
, but you also can use number
if you want the value converted to a number before the ordering is done. Using this data type can be significant because 10,000 comes before 2,000 alphabetically, but not numerically. The data-type
attribute also can contain other data types, but as yet they do nothing.
xsl:apply-templates
Sorting with xsl:apply-templates
is tricky, especially if you do not specify a select
expression that selects nodes of only one type. The results are somewhat contrary to what you might expect. As I explained earlier, that has to do with the fact that no pull processing is involved in sorting with templates, as is the case with xsl:for-each
. Listing 12.6 shows the code from Listing 12.1 with some modifications so that you can see the problems you can get into when sorting with xsl:apply-templates
.
ANALYSIS
Listing 12.6 has the same cars as shown earlier but in this case has two different elements: car
and model
. They are in essence the same, but their names are different, and the model
attribute in the car
element has been replaced with the name
attribute in the model
element.
Now suppose you want to sort these elements like before, regardless of the actual element name. Because they have different attribute names, you might think that the stylesheet in Listing 12.7 will do the trick.
ANALYSIS
Listing 12.7 has a different template for the car
element (line 20) and the model
element (line 29). They do the same thing, except that for the model
element an asterisk is added to the output so that you can see the difference. The xsl:apply-templates
element, used on line 13, contains three xsl:sort
elements. Judging from those elements, the cars are first ordered on manufacturer and then on either the model
attribute or the name
attribute, depending on which is present. Note the xsl:strip-space
element on line 6. If it weren’t present, the result would have several lines of whitespace. Listing 12.8 shows the result.
OUTPUT
ANALYSIS
Everything looks right in Listing 12.8, doesn’t it? Well, you had better look again, and then specifically at the last two lines. Notice that these cars are in reverse order. The model
element is used before the car element, even though the car
element comes first alphabetically. In Listing 12.7, the xsl:sort
element also specified the model
attribute before the name
attribute, so you’d think that isn’t the problem either. In fact, that is the problem because the last one takes precedence in a competing scenario. So, if you have two different elements and two xsl:sort
elements that work on only one of those elements, the last xsl:sort
element has precedence over the former. Reversing their order would therefore yield a different result because the elements with a model
attribute then would have precedence. Listing 12.9 shows the sorting expression from lines 13–17 of Listing 12.7 with the name
and model
attributes reversed.
If you replace lines 13–17 in Listing 12.7 with the code in Listing 12.9, you get the result shown in Listing 12.10.
OUTPUT
ANALYSIS
You can see that reversing the attribute sort order in Listing 12.7 has an impact on the result. Although you wouldn’t expect this, the result is now correct.
The problem described here is the result of a misinterpretation of the precedence rules. The question of how you can get around it remains, however, because reversing the order works in this instance, but doesn’t always help. The answer is remarkably simple and lies in the select expression. Instead of using two separate elements for the different attributes, you also can make a select expression that selects both attributes. Hence, the sort condition in Listing 12.11 would yield the correct result, as shown in Listing 12.10.
With static sorting under your belt, you can move on to dynamic sorting. Dynamic sorting isn’t very hard, but the obvious way doesn’t work, so you need to find another path. The problem is that you need to define a variable or parameter to use dynamic sorting, but when you use a variable or parameter in the select
attribute of xsl:sort
, the result does not yield the desired effect. So, the following line does not work:
<xsl:sort select="$sortkey" />
You might think that putting the variable between curly braces will help, just as you would when you want it evaluated while you’re creating dynamic attributes:
<xsl:sort select="{$sortkey}" />
Unfortunately, using curly braces is not allowed in a select
attribute, so now what? The solution is to create an expression that checks the variable value against the name of the element or attribute you want to sort on. This solution takes some contemplation because the expression is quite tricky to produce sometimes. You must build an expression that contains the elements or attributes you need to order on and then use a predicate to filter out the specific node you need, using the name ()
function. If you want to order on an element, the expression looks like this:
*[name () = $sortkey]
This expression gets the names of all the child elements of the current context, and with the predicate, these names are compared to the value in the sortkey
variable. You can do the same for variables by using the following expression:
attribute::*[name () = $sortkey]
This expression performs the same task as the former expression, but the attribute
axis makes sure you compare attributes only.
The other attributes of the xsl:sort
element fortunately can use the notation with the curly braces because these attributes do not expect a select expression, but rather a string. Listing 12.12 shows a sample using variables to determine the sort order.
ANALYSIS
Listing 12.12 yields the same result as Listing 12.4, as shown in Listing 12.5. However, the result is based on the values of the sort keys and order defined as parameters on lines 7–9. So, you could change the output by adding parameters (for instance, from the command line) when you transform the source XML. Note that on lines 13 and 15 the variables sortkey1
and sortkey2
define the ordering in the select expression. In addition, on line 14 you set the order by putting curly braces around the parameter so that it is evaluated and its value is used. This sample uses global parameters, but you also can use local parameters, local variables, or global variables. Global parameters are specifically useful in Web sites where you want the user to be able to determine on which value should be sorted—for instance, in a shopping basket where you might want to sort on price or description.
Numbering in XSLT is quite elaborate, and the rules surrounding it are quite complicated. I won’t go into all the intricate details but will concentrate on the practical side of numbering. The following sections therefore contain many samples showing the different options you have when using numbering.
Numbering can be inserted at any place within a template or an xsl:for-each
element but is usually used at the start of an element, particularly headings and so on. If you understand numbering using templates, you can apply that same knowledge to numbering with xsl:for-each
, so I will not discuss this topic separately.
You can insert numbering using the xsl:number
element. This element has many attributes, but most are not of any interest when it comes to regular numbering. Depending on the options you use, a number is inserted at the position where you place the element. A number in this context is not necessarily a regular number. It also can be a Roman numeral or a letter, or it can show the section number of parent elements. So, for instance, chapter 3, section 4, paragraph 2 could get the number 3.4.2
or III.iv (b)
. What the actual output looks like will be discussed in the “Controlling the Numbering Output” section later in this lesson.
The samples used here are all based on Listing 12.13, which is the familiar XML source with the menu.
To give you a reference, the first numbering sample, shown in Listing 12.14, is the most basic you can think of.
ANALYSIS
The stylesheet in Listing 12.14 displays the menu with headers for the appetizers, entrees, and desserts. In addition, line 23 makes sure that some kind of numbering is included. Because the xsl:number
element on line 23 has no attributes, it uses all the default values. Listing 12.15 shows the result when Listing 12.14 is applied to Listing 12.13.
OUTPUT
ANALYSIS
In Listing 12.15, each dish element is numbered according to its position related to other dish elements with the same parent element. So, each time a header is shown for appetizers, entrees, or desserts, the numbering starts at 1. It does so because, by default, the level
attribute is set to single, which means that only sibling nodes are counted when the number is generated. This way of numbering is, in fact, the same as just using the position ()
function to get each node’s numbers. In fact, because of the way the xsl:number
element generates the current element’s number, using the position ()
function is faster if all you have to do is simple numbering like this.
As long as you stick with templates that operate only on the context node and siblings with the same name, you are all right when it comes to numbering. Be aware, though, that when you create a template that matches more than one element, the numbering is not necessarily done on all those elements. Instead, the nodes with different names are counted separately, so you end up with numbering that is intertwined and looks erratic. Listing 12.16 shows a sample that will make this point more clear.
ANALYSIS
The stylesheet in Listing 12.16 is more or less the same as that in Listing 12.14, but with the exception that this stylesheet operates on Listing 12.6 and that the template on line 16 matches both car
and model
elements. Also, line 17 now explicitly defines level="single"
, although this is not necessary. Line 19 outputs the value of either the model
or name
attribute, to work correctly for both elements. Listing 12.17 shows the result when this stylesheet is applied to Listing 12.6.
OUTPUT
In Listing 12.17, the numbering of the different elements is mixed. Instead of numbering through for all the elements, the elements are counted separately. Because the elements also are not sorted, the count is performed in document order, thus yielding the mixed numbering.
Fortunately, you can get around this problem by specifying which elements need to be counted. If you specify both elements, you end up with a properly numbered set. The count
attribute serves this purpose. By default, the count
attribute’s value is the name of the context node, so the counts are separated for different elements. If you change the xsl:number
element in Listing 12.16 into
<xsl:number level="single" count="car|model" />
the result is as shown in Listing 12.18.
OUTPUT
ANALYSIS
In Listing 12.18, the numbering is now applied equally to both types of elements, resulting in a neatly numbered list, even though different elements are involved.
In the preceding case, you also could have used count="*"
, but you need to be cautious with it. It might not give you the desired result. It is always a good idea to have as tight a control over numbering as is possible to avoid getting numbers that you didn’t ask for.
The examples in the preceding sections were numbered separately when elements had different parent elements. In some cases, however, you might need to number through, getting all elements that have the same name. Fortunately, the level
attribute has more options, one of them specifically for this purpose. If you use level="any"
in the xsl:number
element, you number through all elements that have the same name as the context element, or the elements selected with the select
attribute. Listing 12.19 shows what happens if you change line 23 of Listing 12.14 into the following:
<xsl:number level="any" />
OUTPUT
ANALYSIS
In Listing 12.19, using level="any"
numbers from the first dish element to the last, regardless of each element’s parent node. In fact, even if the nodes had been on different levels in the document, the numbering would still be on all nodes, sort of like using //dish
to select all nodes in the document and then numbering them. The only difference is that in between the parent elements are also matched and handled by their template or templates.
Caution
As with level="single"
, you need to be really careful when specifying the nodes to count in the count attribute. For instance, count="*"
counts on all the elements in the document.
If you want to do composite numbering, like 3.4.2
, you can probably use complex expressions to get the number of parent and other ancestor elements. Fortunately, you don’t have to because you can just use the level
attribute of the xsl:number
element and set its value to multiple
. However, if you don’t want to end up with numbering that is basically the same as using level="single"
, you need to specify the elements you want included in the count. If you don’t do that, only the siblings of the context node will be counted, and there will be no levels. The easiest way to do composite numbering is to use count="*"
, as follows:
<xsl:number level="multiple" count="*" />
Listing 12.20 shows what happens if you change line 23 of Listing 12.14 into the preceding line.
OUTPUT
ANALYSIS
Listing 12.20 changes level
to multiple
and adjusts the count
attribute. Note that because Listing 12.14 specifies numbering only for the dish
elements, only those elements are numbered. The numbering consists of three levels, which is logical because the source XML also consists of three levels: the menu
element, the children of the menu
element, and the dish
elements. As you can see, each child element of the menu
element is numbered separately, and each time the numbering for the dish
elements starts from the beginning.
In Listing 12.20, numbering starts at the root element, but this result is likely not what you want. After all, what’s the use of having every number start with 1
? A better approach would be to start at the child elements of the root element. There are two ways to get around this problem. The first way is to change the count expression so that it omits the menu
element (or rather numbers just the dish
elements and their parent elements). This means changing line 23 of Listing 12.14 into the following:
<xsl:number level="multiple" count="dish|menu/*" />
This line says “Count all the dish
elements and all the child elements of the menu
element.” The menu
element itself is not counted, as you can see in the result in Listing 12.21.
OUTPUT
ANALYSIS
In Listing 12.21, the numbering starts at the child elements of the menu
element. The leading 1
has completely vanished, and now each section is counted separately and has its own number.
You can accomplish the same result by using the from
attribute of the xsl:number
element. This attribute tells the processor which element or elements serve as the starting point for the composite numbering. This, too, can be an expression. To get the result in Listing 12.21, you also could change line 23 of Listing 12.14 into the following:
<xsl:number level="multiple" count="*" from="menu/*" />
Now all the elements are counted again, but the starting point is menu/*
, which is the child element of the menu
element. The difference between this method and the method used previously to get Listing 12.21 is that the former method gets a different result if dish
elements are on the same level as or higher up the tree than the menu
element. The latter method, however, ignores any elements that are above the level you’re working on. This point might be very important if you’re working with a document that may have the same elements in a different section, but which should be excluded. You then can use the from
attribute to make sure you stick to the section you need to work on. In most cases, you should use the from
attribute to change the depth of the count rather than the count
attribute. That way, you can keep the count
attribute simple. You should use the count
attribute for this only when you can’t do what you want with just the from
attribute.
Note
Both the count
and from
attribute can be defined dynamically using a variable.
As you learned in the preceding sections, numbering in XSLT can be controlled very well and offers a good alternative to using complex expressions. In simple cases, using the position ()
function is often faster, but in others, xsl:number
is really needed. The latter specifically applies to numbering with something other than numbers, such as Roman numerals or letters.
NEW TERM
The format
attribute of the xsl:number
element provides numbering formats to be used when the numbering is inserted. The value of this attribute is pattern based, so you can create mixed numbering types, such as II.3.a, 2.C.1
, and b.iii.i.
Also, it provides the option to use something other than periods to separate the levels, so IV V I
or 3.2 (a)
are equally possible. Basically, the format consists of two types of tokens, which are (in this context) symbols representing some function or delimiter.
The first type of token represents a numeral in some format. The second type is used for punctuation. These tokens can include periods, commas, spaces, parentheses, brackets, curly braces, and so on. By default, the formatting pattern is 1
, which basically means that all numbers are represented as you have seen so far, separated by periods. All options are shown in Table 12.2.
Table 12.2 is not entirely complete. Some languages contain other numerals that are represented in Unicode. For these languages, using those numbering tokens is valid. To use the numbering conventions for a certain language, you also can use the lang
attribute, which can have the values shown earlier in Table 12.1. Another attribute that may influence numbering is letter-value
, which can have the value alphabetic
or traditional
. This attribute, however, is not applicable for most languages.
Caution
Support for language-dependent numbering is not required. It is likely that processors do not support numbering types other than those shown in Table 12.2.
The stylesheet in Listing 12.22 is based on Listing 12.14, but with some more changes, so it has composite numbering on multiple levels, with a provided format.
Listing 12.22 inserts numbering on two levels. The first is inserted with each child element of the menu
element, as shown on line 17 (also note the inserted whitespace on line 18). The same numbering format is used on line 25 for the dish
elements. As you can see, both lines use a format that has three different numbering tokens and different punctuation tokens. The result is shown in Listing 12.23.
OUTPUT
ANALYSIS
In Listing 12.23, the different numbering tokens provide different output, numbering in the chosen format but not inserting the number or letter in the format as is. Also, you can see that the letters numbering the dish
elements appear neatly between parentheses. Note, however, that the numbering for the child elements of the menu
element is wrong. The opening parenthesis is missing, yet the closing parenthesis is still there. It appears this way because the format defines three numbering tokens where only two are needed. This means that before numbering, you should make sure which level you’re on—for instance, by using count (ancestor::*)
or just by knowing at what level the template will be processed.
Number Grouping
A last numbering option is grouping numbers, just as you do with large numbers when you format them with format-number ()
. The two attributes that handle this type of grouping are groupingsize
, which is the number of characters to be grouped together, and grouping-separator
, which is the character to be used to do the grouping. This method is similar to formatting numbers and hardly ever used, so I will not discuss it further.
Today you learned that you can sort XML elements in a node-set by using the xsl:sort
element. You can do so statically, by defining the sort order explicitly, or dynamically, by using a variable or parameter to define the sort key and order. The latter method is somewhat tricky because of the select
attribute. Using this method, you cannot use curly braces to get the value of a variable, so you need to use an expression instead. You can number a node-set by using xsl:for-each
or xsl:apply-templates
. In the latter case, you need to be cautious of side effects.
You also learned that xsl:number
provides elaborate numbering support, which is specifically handy for documents that contain chapters, sections, and so on. You can number nodes on one level or at any level in the document. Which type you choose determines whether the numbering starts at 1 each time or numbers through the whole document. You can also create composite numbers that show the numbers of the ancestor elements.
Tomorrow you will go on a different path and learn how to split your stylesheet into separate files. This capability provides you with the opportunity to reuse partial stylesheets and use a divide-and-conquer strategy.
Q Using xsl:sort
with xsl:apply-templates
caused some side effects. Do I need to be careful of any other side effects?
A No. However, it is a good thing to test sorting thoroughly before actually using it. Both sorting and numbering have many options and therefore might behave differently from what you expect, depending on the structure of the source XML.
Q Does the xsl:number
element have any other values for the level
attribute?
A No. At this point, only single
, any
, and multiple
are valid. If you use an invalid value, the processor might report an error or default to single
.
Q Can I use numbering on a sorted node-set?
A You can, but this approach probably won’t yield the result you want. Numbering is performed based on the position of the element or elements in the document, not in the sorted node-set. You can get around this problem by sorting the node-set into a variable and then numbering on the variable, but then you lose some of the numbering options. Exercise 1 shows you what goes wrong.
Q I have seen xsl:number
used to format a number value, not for numbering a node-set. Is that possible?
A Yes. You can use xsl:number
to format a number. However, numbers are always converted to integers, and the options are limited. The only advantage is that you can convert a number to a character by following the same rules you use to format the numbering. In any other case, format-number ()
is the best choice.
This workshop tests whether you understand all the concepts you learned today. It is very helpful to know and understand the answers before starting tomorrow’s lesson. You can find the answers to the quiz questions and exercises in Appendix A.
1. True or False: xsl:sort
elements are evaluated top to bottom; the top one is always stronger than the bottom one.
2. True or False: Numbering with level="single"
can apply to different elements on the same level.
3. What is the benefit of specifying a data type when sorting?
4. If you use <xsl:sort />
, what is sorted on and in what order?
5. Does level="multiple"
have any effect if count="dish"?
1. Sort Listing 12.6 on manufacturer and model (in ascending order); then number the elements.
2. Number the menu in Listing 12.13 so that the headings are preceded by A, B, or C and the dishes are numbered in Roman numerals per section (no composite number).
3.14.251.128