Day 11
Working with Strings

Yesterday you learned all about data types and how they affect expressions. When you create output, XML, or HTML, you always create text. The string data type therefore is the most important data type in XSLT because you always come back to it.

In today’s lesson, you will learn how to manipulate strings so that you can create the output you want from just a value—for instance, outputting a number as currency or formatting data like you want it. Unlike yesterday’s theoretical lesson, this lesson is about the practical application of the available functions, or tricks if you will.

Today you will learn how to do the following:

• Add strings together

• Strip space properly

• Format strings to show dates, currencies, and more

• Check strings for certain characters

• Transform strings into other strings

Operations on Strings

XSLT offers various operations that you can perform on string values. You can do anything from concatenating strings to getting pieces of strings or transforming strings into something different. These core functions can be combined to perform tasks for which no functions exist.

Gluing Strings Together

In a sense, creating XML documents is gluing pieces of data together as text. Some of these strings are eventually written to the output as element tags or attribute names, but that doesn’t negate the fact that these tags also are text, and tag and attribute names can be created just like any value. The result is output that may or may not contain values that are a combination of several other values. This idea is shown in Listing 11.2, which operates on the familiar XML source shown in Listing 11.1.

LISTING 11.1 Sample XML Document

<?xml version=″“1.0”" encoding="UTF-8"?>
<menu>
  <appetizers title="Work up an Appetite">
    <dish id="1" price="8.95">Crab Cakes</dish>
    <dish id="2" price="9.95">Jumbo Prawns</dish>
    <dish id="3" price="10.95">Smoked Salmon and Avocado Quesadilla</dish>
    <dish id="4" price="6.95">Caesar Salad</dish>
  </appetizers>
  <entrees title="Chow Time!">
    <dish id="5" price="19.95">Grilled Salmon</dish>
    <dish id="6" price="17.95">Seafood Pasta</dish>
    <dish id="7" price="16.95">Linguini al Pesto</dish>
    <dish id="8" price="18.95">Rack of Lamb</dish>
    <dish id="9" price="16.95">Ribs and Wings</dish>
  </entrees>
  <desserts title="To Top It Off">
    <dish id="10" price="6.95">Dame Blanche</dish>
    <dish id="11" price="5.95">Chocolat Mousse</dish>
    <dish id="12" price="6.95">Banana Split</dish>
  </desserts>
</menu>

Note

You can download the sample listings in this lesson from the publisher’s Web site.

LISTING 11.2 Stylesheet Adding Values Together

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:
7:    <xsl:param name="week">1</xsl:param>
8:
9:    <xsl:template match="/">
10:     <xsl:variable name="weekmenu">
11:       <xsl:call-template name="getmenu" />
12:     </xsl:variable>
13:     <xsl:value-of select="$weekmenu/dish[1]" />
14:     <xsl:text>, </xsl:text>
15:     <xsl:value-of select="$weekmenu/dish[2]" />
16:     <xsl:text> and </xsl:text>
17:     <xsl:value-of select="$weekmenu/dish[3]" />
18:     <xsl:text> only $27.95</xsl:text>
19:   </xsl:template>
20:
21:   <xsl:template name="getmenu">
22:     <xsl:copy-of select="/menu/appetizers/dish[position () =
23:         ( ( ($week - 1) mod count (/menu/appetizers/dish)) + 1)]" />
24:     <xsl:copy-of select="/menu/entrees/dish[position () =
25:         ( ( ($week - 1) mod count (/menu/entrees/dish)) + 1)]" />
26:     <xsl:copy-of select="/menu/desserts/dish[position () =
27:         ( ( ($week - 1) mod count (/menu/desserts/dish)) + 1)]" />
28:   </xsl:template>
29: </xsl:stylesheet>

ANALYSIS

Listing 11.2 is a somewhat simplified version of a stylesheet used in an earlier lesson. The template starting on line 21 contains the logic to create a menu of the week. Lines 13–18 are responsible for creating the output of this stylesheet. The xsl:value-of and xsl:text elements create the text in Listing 11.3.

OUTPUT

LISTING 11.3 Result from Applying Listing 11.2 to Listing 11.1

Crab Cakes, Grilled Salmon and Dame Blanche only $27.95

ANALYSIS

In Listing 11.3, the result from Listing 11.2 is one “value” that is a combination of several values and text inserted as literals. This result is not shocking in itself, but it does show you the core of what XSLT is about.

You can achieve the result in Listing 11.3 in a much different way by using a function that is meant to glue strings together—in other words, concatenating several strings. This function, called concat (), is used in Listing 11.4 to create the same result.

LISTING 11.4 Changes to Listing 11.2 to Use concat ()

1:  <xsl:template match="/">
2:    <xsl:variable name="weekmenu">
3:      <xsl:call-template name="getmenu" />
4:    </xsl:variable>
5:    <xsl:value-of select="concat ($weekmenu/dish[1],', ',
6:                                 $weekmenu/dish[2],' and ',
7:                                 $weekmenu/dish[3],' only $27.95')" />
8:  </xsl:template>

ANALYSIS

The key difference between Listing 11.2 and Listing 11.4 is the way the value is created. In Listing 11.4, line 5 employs the concat () function to concatenate strings. This function takes a comma-separated list of values as arguments and glues them together in one string, no matter what the initial data type of the different arguments. The individual arguments are converted with the same rules used by the string () function.

You might be wondering why you would use concat () instead of the method used in Listing 11.2, because that method may be more clear in terms of what the output will be. That is certainly true, but you also can use the concat () function in expressions. If you were to use the method from Listing 11.2, you would have to create a variable and use it in the expression. Whether you use concat () in a situation like Listing 11.4 is up to your personal preference, but in expressions, using it is essential. In combination with other functions, it can be quite useful, as I’ll discuss on Day 16, “Advanced Data Selection.”

Checking for Characters in a String

Especially when you work with string values that are quite long, you might want to check if that string contains some character or sequence of characters. Checking for characters is useful in parsing scenarios to see whether a string conforms to a certain syntax. XSLT offers two functions that are helpful in such cases: contains () and starts-with (). These functions are very much related to one another but offer slightly different functionality.

Using the contains () Function

The contains () function checks whether a certain sequence of characters exists in a given string. If the given sequence exists anywhere in the given string, the function returns true; otherwise, it returns false. Apart from parsing and syntax checking, you can use this function to filter out nodes that do or do not contain certain characters or sequences of characters. Using this function gives you much more fine-grained control than matching on entire nodes or values of nodes. An example is shown in Listing 11.5.

LISTING 11.5 Fine-Grained Filtering of Values

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:    <xsl:strip-space elements="*" />
7:
8:    <xsl:template match="/">
9:      <xsl:apply-templates />
10:   </xsl:template>
11:
12:   <xsl:template match="menu/*">
13:     <xsl:value-of select="concat (@title,'&#xA;')" />
14:     <xsl:apply-templates />
15:     <xsl:text>&#xA;</xsl:text>
16:   </xsl:template>
17:
18:   <xsl:template match="dish">
19:     <xsl:if test="not (contains (.,'Salmon') or
20:                   contains (.,'Sea') or
21:                   contains (.,'Crab') or
22:                   contains (.,'Prawn'))">
23:       <xsl:value-of select="." /> $<xsl:value-of select="@price" />
24:       <xsl:text>&#xA;</xsl:text>
25:     </xsl:if>
26:   </xsl:template>
27: </xsl:stylesheet>

ANALYSIS

Listing 11.5 displays a menu based on Listing 11.1. The template on line 12 matches any of the appetizers, entrees, or desserts elements, with line 13 inserting its title and a linefeed. Note that the linefeed is inserted using the &#xA; character reference. The template on line 18 processes each dish element, but not all dish elements are sent to the output. This is caused by the xsl:if element starting on line 19. The test expression used here makes sure that lines 23 and 24 are executed only when the string value of the dish element does not contain 'Salmon', 'Sea', 'Crab', or 'Prawn'. In essence, it takes out any dishes that have something to do with seafood (very handy if you get sick from seafood). The expression uses the contains () function to check for each sequence of characters if it can be found in the value of the current element.

As you can see, the contains () function takes two arguments: the string to be checked and the string to be checked for. The following rules are applied to determine the result:

• The result is true if the first string contains the exact same sequence of Unicode characters as the second string.

• If the second string is empty, the result is always true.

• If the first string is empty, the result is false unless the second is also empty.

Caution

The XSLT specification is unclear whether empty strings should be handled as specified in the preceding paragraphs. Most, if not all, processors conform to this interpretation, but it is wise to test whether your processor of choice does as well.

When Listing 11.5 is applied to Listing 11.1, the result looks like Listing 11.6.

OUTPUT

LISTING 11.6 Result from Applying Listing 11.5 to Listing 11.1

Work up an Appetite
Caesar Salad $6.95

Chow Time!
Linguini al Pesto $16.95
Rack of Lamb $18.95
Ribs and Wings $16.95

To Top It Off
Dame Blanche $6.95
Chocolat Mousse $5.95
Banana Split $6.95

ANALYSIS

In Listing 11.6, the result neatly displays all dishes that do not contain seafood (at least not in the name of the dish).

Using the starts-with () Function

The contains () function is useful if you want to check whether a value contains a sequence of characters, but you don’t need to know where the sequence of characters actually occurs in the given string. In many scenarios, you actually want to know if the sequence of characters occurs at a certain position within the given string, in particular if a value starts with a certain sequence of characters. The starts-with () function is supplied just for this purpose. This function is particularly useful to check whether something is missing from a value, such as http:// before a URL. Listing 11.7 shows a sample XML document with links.

LISTING 11.7 Sample XML Document with Links to Web Sites

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <index>
3:    <link href="http://www.w3.org">W3 Consortium</link>
4:    <link href="www.topxml.com">Top XML</link>
5:    <link href="http://www.xml.org">The XML Industry Portal</link>
6:  </index>

ANALYSIS

The link on line 4 of Listing 11.7 does not start with http://, so when you create a link from it in a Web page, chances are the browser will start looking for a file named www.topxml.com on the current Web site instead of going to the other Web site. Listing 11.8 outputs a list of links in HTML to get around this problem.

LISTING 11.8 Stylesheet Dealing with Missing http://

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="html" encoding="UTF-8" />
6:
7:    <xsl:template match="/">
8:      <html>
9:      <body>
10:       <xsl:apply-templates />
11:     </body>
12:     </html>
13:   </xsl:template>
14:
15:   <xsl:template match="link">
16:     <a>
17:       <xsl:attribute name="href">
18:         <xsl:if test="not (starts-with (@href, 'http://"))">
19:           <xsl:text>http://</xsl:text>
20:         </xsl:if>
21:         <xsl:value-of select="@href" />
22:       </xsl:attribute>
23:       <xsl:value-of select="." />
24:     </a><br />
25:   </xsl:template>
26: </xsl:stylesheet>

ANALYSIS

Listing 11.8 basically does nothing shocking. It just creates an HTML document with a list of links. The important stuff happens on lines 18–20, which check whether the value of the href attribute of the link element starts with http://. If it does not, this text is added in front of the value to be inserted into the href attribute of the a element. The result is shown in Listing 11.9.

As you can see, the starts-with () function takes two arguments: the string to be checked and the string to check for. If the first argument starts with the the second argument, the function returns true. Basically, the same rules apply as for the contains () function, except that the string to check for must be at the start of the string checked.

OUTPUT

LISTING 11.9 Result from Applying Listing 11.8 to Listing 11.7

<html>
   <body>
      <a href="http://www.w3.org">W3 Consortium</a><br>
      <a href="http://www.topxml.com">Top XML</a><br>
      <a href="http://www.xml.org">The XML Industry Portal</a><br>

   </body>
</html>

Both starts-with () and contains () are useful functions to check values. They are, however, not meant to perform certain tasks, such as sorting. You probably could get these functions to work, but doing so would take a lot of code. XSLT offers elements for sorting, so you should use them instead.

Tip

Don’t use starts-with () to sort values. Instead, use the sorting and numbering constructs discussed on Day 12, “Sorting and Numbering.”

You don’t necessarily have to check element or attribute values with the contains () and starts-with () functions. If you use them in combination with the name () function, you also can check whether element or attribute names conform to certain needs. This way, you can create wildcard expressions that only match (or select) nodes with certain characters in their name. Listing 11.10 shows this principle in action.

LISTING 11.10 Stylesheet Filtering Elements on Name

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:    <xsl:strip-space elements="*" />
7:
8:    <xsl:template match="/">
9:      <xsl:apply-templates select="menu/*" />
10:   </xsl:template>
11:
12:   <xsl:template match="menu/*[starts-with (name (),'ent')]">
13:     <xsl:value-of select="concat (@title,'&#xA;')" />
14:     <xsl:apply-templates />
15:     <xsl:text>&#xA;</xsl:text>
16:   </xsl:template>
17:
18:   <xsl:template match="dish">
19:       <xsl:value-of select="." /> $<xsl:value-of select="@price" />
20:       <xsl:text>&#xA;</xsl:text>
21:   </xsl:template>
22:
23:   <xsl:template match="*" />
24: </xsl:stylesheet>

ANALYSIS

Listing 11.10 is similar to earlier listings in that it creates a list of dishes based on the menu XML document in Listing 11.1. This time, however, you get only a list of entrees. This result has to do with line 12, which has a predicate that matches only child elements of the menu element that start with ent. In Listing 11.1, only the entrees element applied, but if another element—say entradas—were in there as well, that element also would match. This search is similar to searches in databases or on a hard disk with a search string such as ent*, with the * character being a wildcard character. The result from applying Listing 11.10 to Listing 11.1 is shown in Listing 11.11.

OUTPUT

LISTING 11.11 Result from Applying Listing 11.10 to Listing 11.1

Chow Time!
Grilled Salmon $19.95
Seafood Pasta $17.95
Linguini al Pesto $16.95
Rack of Lamb $18.95
Ribs and Wings $16.95

Getting the Length of a String

The length of a string in itself is not very valuable information. It can, however, be useful when you’re determining what some output should look like. When you’re outputting HTML that may not be very significant because of the way HTML deals with formatting, but in scenarios in which much more control is needed over the output, this information may be very significant. For example, say you have to deal with a fixed width for a text document. In this situation, it is likely that the length of a string is used in conjunction with other functions performing operations on strings. When you create HTML pages with forms, you can also use this function to dynamically determine the size of a text box.

You can get the length of a string by using the string-length () function, which takes one argument: the string for which you want to get the length. This argument is optional, however. If you do not specify it, the length of the context node is returned instead.

Suppose you have to create the menu again, but now have a fixed width that does not allow dish names to be too long. You could create a stylesheet that takes into account the length of the dish name and price. Listing 11.12 does exactly this.

LISTING 11.12 Stylesheet Creating a Menu Without Long Names

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:    <xsl:strip-space elements="*" />
7:
8:    <xsl:template match="/">
9:      <xsl:apply-templates select="menu/*" />
10:   </xsl:template>
11:
12:   <xsl:template match="menu/*">
13:     <xsl:value-of select="concat (@title,'&#xA;')" />
14:     <xsl:apply-templates />
15:     <xsl:text>&#xA;</xsl:text>
16:   </xsl:template>
17:
18:   <xsl:template match="dish">
19:      <xsl:if test="string-length () + string-length (@price) &lt; 19">
20:        <xsl:value-of select="." /> $<xsl:value-of select="@price" />
21:        <xsl:text>&#xA;</xsl:text>
22:     </xsl:if>
23:   </xsl:template>
24: </xsl:stylesheet>

ANALYSIS

Listing 11.12 again does much of what was done in earlier samples. This time, line 19 checks whether the length of the dish name and its price exceed 18 characters. Only if the length is 18 characters or fewer does the dish make it into the menu. In the test expression, the string-length () function is used first to get the length of the value of the context node. This is why no argument is supplied. The second time the function is used, the price attribute is given as an argument because you need to add the length of that value to the length of the dish name. The result is shown in Listing 11.13.

OUTPUT

LISTING 11.13 Result from Applying Listing 11.12 to Listing 11.1

Work up an Appetite
Crab Cakes $8.95
Jumbo Prawns $9.95
Caesar Salad $6.95

Chow Time!
Seafood Pasta $17.95
Rack of Lamb $18.95

To Top It Off
Dame Blanche $6.95
Banana Split $6.95

ANALYSIS

Listing 11.13 is not very wide. It contains only those dishes that meet the length requirement. One appetizer and one dessert didn’t make the cut, as well as three entrees.

Note

If you want to check whether a string is empty, use the boolean () function instead of string-length (). The expressions string-length ('abc') = 0 and boolean ('abc') yield the same result. The boolean () function was specifically designed for these sorts of tasks, so it is likely to perform better than the former expression.

Working with Partial Strings

The functions discussed so far have provided you with information regarding a string. When you actually want to do something with a string, you need different functions. XSLT provides several functions to obtain only a part of a string, either by specifying the position of a substring or a portion of a string that occurs before or after a certain sequence of characters. Let’s look at these functions with a series of examples based on the sample XML document in Listing 11.14.

LISTING 11.14 Sample XML Document with a File List

<?xml version=″“1.0”" encoding="UTF-8"?>
<folder name="My Files">
  <file>adresses.mdb</file>
  <file>basket.doc</file>
  <file>house.dwg</file>
  <file>names.xml</file>
  <file>namesout.xsl</file>
</folder>

ANALYSIS

Listing 11.14 contains a list of files in a folder, and each file has a different file extension. The object is now to create a text file listing these files and telling you what each file type is, based on the extension. Listing 11.15 shows a stylesheet that does exactly this.

LISTING 11.15 Stylesheet Creating a File List with File Types

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:    <xsl:strip-space elements="*" />
7:
8:    <xsl:template match="/">
9:       <xsl:apply-templates />
10:   </xsl:template>
11:
12:   <xsl:template match="folder">
13:     <xsl:value-of select="concat ('&lt;',@name,'&gt;&#xA;')" />
14:     <xsl:apply-templates />
15:     <xsl:text>&#xA;</xsl:text>
16:   </xsl:template>
17: 
18:   <xsl:template match="file">
19:     <xsl:variable name="len" select="string-length ()" />
20:     <xsl:variable name="ext" select="substring (., $len - 2)" />
21:     <xsl:value-of select="concat (., '&#x20;')" />
22:     <xsl:choose>
23:       <xsl:when test="$ext = 'doc'">
24:         <xsl:text> (Word document)</xsl:text>
25:       </xsl:when>
26:       <xsl:when test="$ext = 'dwg'">
27:        <xsl:text> (AutoCad drawing)</xsl:text>
28:       </xsl:when>
29:       <xsl:when test="$ext = 'xml'">
30:         <xsl:text> (XML document)</xsl:text>
31:       </xsl:when>
32:       <xsl:when test="$ext = 'xsl'">
33:         <xsl:text> (XSL stylesheet)</xsl:text>
34:       </xsl:when>
35:     </xsl:choose>
36:     <xsl:text>&#xA;</xsl:text>
37:   </xsl:template>
38: </xsl:stylesheet>

ANALYSIS

The template on line 12 of Listing 11.15 creates a line of output for each folder in the source XML, surrounded by < and > characters—for example, <myfolder>.

Line 13 takes care of this task by using the concat () function. Of course, you could use xsl:text elements, but this approach is far shorter to write and possibly easier to understand. The template that starts on line 18 deals with each file and is the place where the real action is. First, a variable is created on line 19 to hold the length of the value in the context node. Strictly speaking, a variable is not necessary because you need the length only once. On line 20, a variable is created to hold the file extension. As you can see, the value of that variable is determined with the substring () function. The first argument passed to it is the context node; the second argument is the starting position. Any character at or after the starting position is part of the new value. Because the last position in the string is equal to its length, and the extension is always three characters long, the starting position is the string length minus two, which is exactly what the expression says. Line 21 outputs the value of the context node and puts a space after it. Then the xsl:choose element on line 22 makes sure that if the extension from the file is known to the stylesheet, a full file type is shown in the output, as you can see in Listing 11.16.

OUTPUT

LISTING 11.16 Result from Applying Listing 11.15 to Listing 11.14

<TYXSLT21>
adresses.mdb
basket.doc  (Word document)
house.dwg  (AutoCad drawing)
names.xml  (XML document)
namesout.xsl  (XSL stylesheet)

ANALYSIS

The result in Listing 11.16 shows that no file type is known for .mdb files. The others all show the full file type.

The substring () function, as used in Listing 11.15, takes two arguments: the string to get a substring from and the position to start. Note that positions in a string in XSLT start at 1 and not at 0, as is the custom in languages such as C++ and Java. Each character is counted as one character, no matter how it was encoded. So, #xA is counted as one character, as are Unicode surrogate pairs that go beyond the usual Unicode boundary of 65,536 characters.

The substring () function has one more argument, which is optional: the length of the string you want to get. By using this argument, you can get a substring from the start or middle of a string rather than only at the end. For instance, substring (‘namesout.xsl’,6,3) returns ‘out’.

Getting a Substring Before or After Other Characters

Two XSLT functions that are very much related are substring-before () and substring-after (). Both functions take two arguments: the string being searched and the string to search for. If an occurrence of the second argument is found in the first, substring-before () returns the string up to that occurrence, excluding the occurrence itself. substring-after () does exactly the same, but returns the string starting after the occurring characters. If the first string does not contain an occurrence of the second argument, an empty string is returned.

Caution

If an empty string is returned, this could also mean that the first string is equal to the second or that the first string starts or ends with the second string, depending on the function you used. You can use contains () and string-length () to check whether this is the case.

Listing 11.17 performs the same task as Listing 11.15, but it has been changed in several places, among others to show the use of substring-after ().

LISTING 11.17 Alternative Stylesheet for Listing 11.15

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:    <xsl:strip-space elements="*" />
7:
8:    <xsl:variable name="extensions">
9:      <ext ext="doc">Word Document</ext>
10:     <ext ext="dwg">AutoCad Drawing</ext>
11:     <ext ext="xml">XML Document</ext>
12:     <ext ext="xsl">XSL Stylesheet</ext>
13:   </xsl:variable>
14:
15:   <xsl:template match="/">
16:     <xsl:apply-templates />
17:    </xsl:template>
18:
19:   <xsl:template match="folder">
20:     <xsl:value-of select="concat ('&lt;',@name,'&gt;&#xA;')" />
21:     <xsl:apply-templates />
22:     <xsl:text>&#xA;</xsl:text>
23:   </xsl:template>
24:
25:   <xsl:template match="file">
26:     <xsl:variable name="ext" select="substring-after (.,'.')" />
27:     <xsl:value-of select="concat (., '&#x20;')" />
28:     <xsl:if test="$extensions/ext[@ext=$ext]">
29:       <xsl:value-of select="concat (' (',$extensions/ext[@ext=$ext],')')" />
30:     </xsl:if>
31:     <xsl:text>&#xA;</xsl:text>
32:   </xsl:template>
33: </xsl:stylesheet>

ANALYSIS

Listing 11.17 takes a different road to achieve the same result as Listing 11.15. First, the template matching the file element starting on line 25 uses substring-after () to get the file’s extension. As you can see on line 26, the length is no longer needed, and the same result is achieved without getting the length of the string first. Because each file consists of a name and an extension separated by a period, getting the strings after the period works just fine.

A second change to Listing 11.17 is the way the file type is retrieved for the extension. Listing 11.15 used an xsl:choose element for this task. In Listing 11.17, a variable named extensions is created on line 8. This variable contains an element for each known extension. The template matching the file element now just checks whether the extension is known on line 28, and if the extension is known, it selects the right extension from the variable. To make the output complete, the file type is surrounded by parentheses using the concat () function. Both the check and output of the extension are performed using the expression $extensions/ext[@ext=$ext]. For the test, this expression returns false if at least one node matches this expression, and the output takes the first matching element. The advantage of this approach over that of Listing 11.15 is that you don’t have to add xsl:when elements; you can just add ext elements, which are much easier and clearer.

Listing 11.17 is clearcut and obvious because Listing 11.14 doesn’t contain any values that have more than one period—for instance, names.out.xsl. In those cases, you need to be careful because both substring-before () and substring-after () operate on the first period only. Therefore, substring-before ('names.out.xsl') returns 'names', and substring-after ('names.out.xsl') returns 'out.xsl'. The latter, of course, does not work correctly with Listing 11.17.

Replacing Parts of a String

Checking strings for contents, length, and so on is all very well, but you also may want to replace sections of strings or certain characters. You could replace parts the hard way and use string-before () and string-after () to get the job done, but you can also do it the easy way: by using the versatile function translate ().

The translate () function works differently from the other functions discussed so far. This function has three arguments:

• The original string value

• A string containing the characters that need to be replaced

• A string containing characters that should be used to replace the characters given in the second argument

The characters given in the second argument are not treated as a sequence of characters to be searched for, but each character is searched for separately and replaced by the character given to replace it. This is the character that has the same position in the third argument. So, translate ('abc','ac','AC') returns 'AbC'. As you can see, the letters a and c have been replaced by A and C, which have the same position in the third argument as a and c have in the second argument. Because the replacement is performed on a character-by-character basis instead of on a sequence of characters, the rules surrounding the translate () function are very important.

The second argument holds all the characters that need to be replaced in the source value. Each character in the source value that does not occur in that set of characters is copied to the destination as is. If the character occurs at a certain position in the list of characters to be replaced, that character is replaced with a character in the same position in the third argument, which contains the replacement characters. Hence, if the earlier expression had been translate ('abc','ac','CA'), the result would have been 'CbA'.

What happens when the third argument is shorter than the second? In that case, no character appears in the same position as in the list of characters to be replaced, so no character is sent to the output. For example, translate ('abc','ac','A') returns 'Ab'. This also means that if no replacement characters are given at all, all characters that need to be replaced are omitted from the result. If the list of replacement characters is longer than the list of characters to be replaced, the additional characters are ignored. Also, if a character occurs more than once in the list of characters to be replaced, only the first occurrence is used. Subsequent occurrences, as well as their replacement characters, are ignored. So, for example, translate ('abc','aa','AC') returns 'Abc'.

Listing 11.18 shows how you can use the translate () function.

LISTING 11.18 Using translate ()to Create Uppercase Strings

1:  <xsl:template match="file">
2:    <xsl:variable name="ext" select="substring-after (.,'.')" />
3:    <xsl:variable name="file"
4:          select="translate (.,'abcdefghijklmnopqrstuvwxyz'
5:                             ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ')" />
6:    <xsl:value-of select="concat ($file, '&#x20;')" />
7:    <xsl:if test="$extensions/ext[@ext=$ext]">
8:      <xsl:value-of select="concat (' (',$extensions/ext[@ext=$ext],')')" />
9:    </xsl:if>
10:   <xsl:text>&#xA;</xsl:text> 
11: </xsl:template>

ANALYSIS

Listing 11.18 shows a part of Listing 11.17, but with a change so that all files are displayed in uppercase. For this purpose, a variable named file is created on line 3. Its value is set using the translate function, which shows that all lowercase characters from a to z have to be replaced by their uppercase counterparts. On line 6, the value of the file variable is written to the output. You can see the result in Listing 11.19.

OUTPUT

LISTING 11.19 Result from Applying Listing 11.18 to Listing 11.14

<TYXSLT21>
ADRESSES.MDB
BASKET.DOC  (Word Document)
HOUSE.DWG  (AutoCad Drawing)
NAMES.XML  (XML Document)
NAMESOUT.XSL  (XSL Stylesheet)

Note

In this sample, any characters with accents and so on have been left out of the equation. If you need a function that creates an uppercase string for every character, you have to add all these characters, too. In that case, it might be a good idea to create a called template with a parameter called touppercase, for instance.

Formatting Data

If you’re creating certain output, you might want to format certain values in a specific way. You may, for instance, want to format a number as currency or with a specific number of decimals. Or you may want to format a date according to a certain country’s conventions. Because these data types don’t exist by themselves, you have to rely on functions for formatting numbers and manipulating strings to get the job done.

Formatting Numbers

If you’re working with numbers, you probably want to control what the output looks like. Especially with numbers with many digits after the decimal point, you might want to restrict the number of digits actually displayed. In addition, you might want to display numbers in a format supported by a specific country or region. To format numbers, XSLT provides the function format-number (), which uses the pattern you provide to format a number. This pattern defines how many decimals there should be; if any groupings should be used for thousands, millions, and so on; and what happens when a number is negative.

The basic building blocks of a pattern are 0 for a mandatory digit and # for an optional digit. In addition, a period is used to specify the position of the decimal point, and a comma is used for grouping. Table 11.1 shows some examples of common patterns and their result.

TABLE 11.1 Decimal Formatting Patterns

Image

As you can see in Table 11.1, the patterns are quite simple to create. Although the samples in this table are far from complete, they should give you the general idea on what patterns do and how you can create your own.

Caution

Scientific notation is not supported in XSLT 1.0. An earlier release of MSXML did support scientific notation, but this support has been changed in more current releases. Both MSXML and Xalan now report an error, whereas Saxon just creates erroneous output.

Besides providing the pattern for a number, you also can add a prefix and suffix to the pattern. This capability is useful for working with percentages, currencies, and other known formats that have a specific meaning (such as a bank balance). The characters you can use in a prefix or suffix are bound to the same rules as normal text. The characters used to define a pattern are a problem here because you should put them between single quotation marks. However, because the pattern is likely between single quotation marks itself, using them is not possible. The xsl:decimal-format element provides a way around this problem, as I will show you a little later. Table 11.2 shows some common examples for prefixes and suffixes.

TABLE 11.2 Numbers Formatted with a Prefix and/or Suffix

Image

As you can see in Table 11.2, currencies greatly benefit from the use of prefixes and suffixes. Note that you don’t need to put spaces between the number and the prefix or suffix. A downside is the position of the minus symbol to denote negative amounts. Instead of appearing after a prefix, it ends up before a prefix, which might be confusing. An option here is to insert the currency symbol in a separate xsl:text element so that it will always appear in front of the numbers and minus symbol.

Another option is the final weapon in the arsenal of the number pattern: being able to create two different patterns for positive and negative numbers. This allows you, among other things, to explicitly put the minus sign in front of the currency symbol if you want to do so. Another good example that can benefit from this use is numbers in a balance, where a difference needs to be shown between a positive and negative balance. Some samples are shown in Table 11.3.

TABLE 11.3 Numbers Formatted Differently for Positive and Negative Numbers

Image

Localization

So far, the format-number () function has been used with two arguments: the number to be formatted and the formatting pattern. Unfortunately, this means that the output is generated in a numeric format in which the decimal separator is a period and the grouping separator a comma. Some countries use a different notation, with the comma serving as decimal separator and the period as grouping separator. To control these settings, you can use the xsl:decimal-format element to create a named decimal format that you can pass as a third argument to the format-number () function.

The xsl:decimal-format element is a top-level element with only attributes. You can define an alternate numeric format like this:

<xsl:decimal-format name="EU" decimal-separator="," grouping-separator="." />

This line defines a number format with the decimal separator and grouping separator as used in the European number format. When you pass this format to the format-number () function, you need to create the pattern in this format. Table 11.4 shows how this change would affect the values in Table 11.1.

TABLE 11.4 European Decimal Formatting Patterns

Image

As you can see in Table 11.4, the value you pass along to the format-number () function is the same, but the patterns now have commas where there were periods, and vice versa. Each time, the EU number format is passed along as well. The results are now in European decimal format, except for Infinity and NaN, which are still the same.

Changing Special Number Values

As shown in the preceding tables, the default output for special number values, such as Infinity, stays the same with different decimal formats. You can, however, use xsl:decimal-format to change these values as well. You can use the infinity (note that this is not capitalized) and NaN attributes to change the values. So, this line of code

<xsl:decimal-format name="special" infinity="&#8734;" NaN="Invalid" />

yields Invalid for values that are not a number and ∞ for infinity.

You also can change the minus sign by using the minus-sign attribute like this:

<xsl:decimal-format name="minus" minus-sign="NEGATIVE " />

Now, using the format-number () function

format-number (-1234.56,'#,##0.0','minus')

yields NEGATIVE 1,234.6 as a value.

Changing Default Pattern Characters

As I mentioned earlier, the characters used to specify a pattern, such as # and 0, are hard to get into a prefix or suffix. To get around this problem, you can change the special characters used in the pattern. You could, for instance, change the # character into @ by using the following decimal format:

<xsl:decimal-format name="pound" digit="#" />

Now a format function can use the # character in a prefix like

format-number (1234.56,'#@,@@0.0','pound')

to get the output #1,234.6.

You can change the whole set of characters shown in Table 11.5.

TABLE 11.5 Attributes to Change Pattern Characters in format-number ()

Image

Setting the Default Format

You can create one xsl:decimal-format element without a name. In that case, you are overriding the default number format for the stylesheet. This capability is handy if you know that the whole stylesheet has to be in European format, for instance. You change the format like this:

<xsl:decimal-format decimal-separator="," grouping-separator="." />

If you now want to have a value in U.S. format, you need to make a named decimal format and use it with the format-number () function. If you do not specify a named format with the format-number () function, you end up with European notation.

Formatting Date And Time

XSLT does not have a date and/or time data type. Also, no functions have been specifically created to work on date and time values. Basically, when you’re working with date and time, you can create your own format for use in XML source documents. As long as all documents use the same date/time format you choose, you can use string or number functions to extract the date and time and display the appropriate format.

XML Schema does have a dateTime type, which stores date/time values. It is nothing more than a string conforming to a set of rules. A typical dateTime value under the XML Schema rules looks like this:

2001-09-27T13:20:00-05:00

The numbers in front of the T represent the date, and the numbers after the T represent the time. The date format is YYYY-MM-DD. Note that the year has to be four digits, and month and day have to be two digits. The - character is used as separator. The time format is HH:MM:SS, with each value written as two digits. After the time, notice that another time is listed, without seconds and preceded by a minus symbol. This time represents the time zone used. For example, -5:00 means Greenwich mean time (GMT) minus five hours, which is eastern standard time, and +01:00 indicates the European time zone. The time zone is optional, however.

By using the format used in XML Schema, you can easily write templates that output the date and/or time in the format you want. Because this format is the most widely used date/time format, you would be wise to stick to it. Products such as SQL Server 2000 and Oracle 9i create XML from the tables in a database using this format.

The stylesheet in Listing 11.20 formats the date of a date/time value in XML Schema dateTime format.

LISTING 11.20 Stylesheet Formatting Date

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:
7:    <xsl:variable name="monthnames">
8:      <month number="1">January</month>
9:      <month number="2">February</month>
10:     <month number="3">March</month>
11:     <month number="4">April</month>
12:     <month number="5">May</month>
13:     <month number="6">June</month>
14:     <month number="7">July</month>
15:     <month number="8">August</month>
16:     <month number="9">September</month>
17:     <month number="10">October</month>
18:     <month number="11">November</month>
19:     <month number="12">December</month>
20:   </xsl:variable>
21:
22:   <xsl:template match="/">
23:     <xsl:call-template name="formatdate" />
24:   </xsl:template>
25:
26:   <xsl:template name="formatdate">
27:     <xsl:param name="datetime" select="." />
28:     <xsl:variable name="year" select="substring-before ($datetime,'-')" />
29:     <xsl:variable name="month" select="number (substring ($datetime,6,2))" />
30:     <xsl:variable name="day" select="substring ($datetime,9,2)" />
31:     <xsl:text>Today is </xsl:text>
32:     <xsl:value-of select="$monthnames/month[@number=$month]" />
33:     <xsl:text> </xsl:text><xsl:value-of select="$day" />
34:     <xsl:text>, </xsl:text><xsl:value-of select="$year" />
35:     <xsl:text>.</xsl:text>
36:   </xsl:template>
37: </xsl:stylesheet>

ANALYSIS

In Listing 11.20, a called template starting on line 26 tells you today’s date. That date has to come from an XML source, which is shown in Listing 11.21. The template has been made generic by creating a parameter on line 27 that takes the context element if no parameter is specified by the caller (as is the case here). That parameter is then dissected into separate variables named day, month, and year by using some of the string functions discussed earlier. Also, note that the month is converted to a number just to be on the safe side. The month number is used to select the month name from the variable monthnames, which is created on line 7. This selection is performed on line 32, which contains an expression checking the current month number against the number in the variable and displaying the one that matches. The rest of the elements surrounding it generate the text to make it look nice. Applying Listing 11.20 to Listing 11.21 yields the result shown in Listing 11.22.

LISTING 11.21 XML Source with a Date

<?xml version=″“1.0”" encoding="UTF-8"?>
<date>2001-09-27T13:20:00-05:00</date>

OUTPUT

LISTING 11.22 Result from Applying Listing 11.20 to Listing 11.21

Today is September 27, 2001.

Formatting Other Data

By now, the general idea should be clear to you: There are no data types other than strings and numbers, so you have to manually format data with a specific meaning. This puts quite a bit of responsibility in your hands when you’re creating output. If you want the output to be viewable in some way, you just format it for the output. However, when you’re transforming XML for communication or data storage purposes, you also can change the format, based on what the target system expects. Because no separate data types exist, everybody can create his or her own, giving rise to incompatibility. Fortunately, XML Schema is likely to stimulate some form of uniformity, and although XSLT itself does not support XML Schema, it does benefit from this uniformity (or rather you do when writing XSLT).

Summary

Today you learned that although XSLT contains only a few data types, you can use them to create values that contain data corresponding to a data type. Using the functions that are available in XSLT to manipulate strings and numbers, you can format this data as it should be in the output. A good example is the dateTime type, which is defined in XML Schema but is not supported in XSLT. You can use this data type and dissect the value to show the date and time in a format that you want.

The functions substring (), contains (), and translate () all have an important role in these processes. In addition, format-number () is very important for number output for different countries and different formats for different purposes, such as accounting, computing, and so on. Other functions such as concat () could be omitted from XSLT because they can easily be simulated with other constructs. These functions, however, make writing expressions much easier and make way for shorter and more understandable stylesheets.

Tomorrow you will learn about sorting and numbering node-sets. You will expand upon the knowledge about number formatting you learned today to include other types of numbering, such as with letters or Roman numerals.

Q&A

Q Why isn’t there a function ends-with (), to complement starts-with ()?

A The people who created the XSLT (or actually XPath) specification obviously didn’t think it was necessary because you can create it yourself by using substring () and contains (). Note that the same goes for starts-with ().

Q I want to replace sequences of characters with other sequences. The translate () function seems to be ill suited for this task. What do I use?

A Indeed, translate () isn’t too handy if you want to replace sequences of characters. You need to use a combination of translate (), substring (), and contains () to pull off this task. This is one of the issues that XSLT 2.0 might address.

Q Being able to display different currencies is nice, but can I also do currency conversions?

A Yes, you can. You will learn more about this topic on Day 18, “Building Computational Stylesheets.”

Q Are there any other date/time notations in use?

A Yes. Some types have been defined formally; others have not. The XML Schema notation is very common in XML documents. For all intents and purposes, you can view it as the standard to be used in any XML or XSLT document, even if no schema is attached to it.

Workshop

This workshop tests whether you understand all the concepts you learned today. It is very helpful to know and understand the answers before starting tomorrow’s lesson. You can find the answers to the quiz questions and exercises in Appendix A.

Quiz

1. True or False: contains () returns a number with the position of the first occurrence of the string searched for.

2. True or False: String manipulation functions can be used only in expressions.

3. Determine the outcome of the following expression:

  substring-after ('abcxdefxgh','x')

4. Determine the outcome of the following expression:

  translate ('abcxdefxgh','cfx','||')

5. How can you create different number formats for positive and negative values?

Exercises

1. Change Listing 11.20 so that it also shows the time as The time is 13:20 hours and 00 seconds in timezone -05:00.

2. Create an XML file with several numbers and create a stylesheet that displays the values in different number formats. Experiment with different decimal formats.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.104.124