16 Advanced Data Selection

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

WEEK 1 Day 16
Advanced Data Selection

Yesterday you learned that using namespaces is a great way to structure documents and separate colliding vocabularies. Although namespaces make your documents more complex, they also keep you from processing elements of different vocabularies the same way, which is often not what you want.

Today’s lesson takes a closer look at selecting data with expressions. This topic was covered in the first week, but there you only scratched the surface. With each lesson since then, your understanding of expressions and data selection has grown. Now it’s time to go back to them for the finishing touch. Today you will learn more details about building based on your current knowledge and will learn about some more selection techniques available to you.

In today’s lesson, you will learn the following:

• How to use implicit data type conversions to your advantage

• How to select unique values from a node-set with duplicate values

• How you can use keys to access data quickly

• How to work with unique ID values

More About Expressions

Expressions were discussed on Day 3, “Selecting Data,” and have been used throughout this book. So, why come back to them now? Expressions and patterns are the most important part of XSLT because they define which data you are actually working with. Without expressions and patterns, XSLT wouldn’t be worth anything. In this section, I will discuss some characteristics of expressions. You have been introduced implicitly to most of these characteristics, but it is useful to make you fully aware of them.

There are basically two types of expressions: one used to match or select some data, the other to compare data. Of course, these two are mixed in many cases, because when you compare data, you have to select data as well. Also, when you select data, you can use predicates that make a comparison to select certain data.

To discuss expressions, this section uses the familiar source document shown in Listing 16.1.

LISTING 16.1 Sample XML Document

<?xml version=″“1.0”" encoding="UTF-8"?>
<menu>
  <appetizers title="Work up an Appetite">
    <dish id="1" price="8.95">Crab Cakes</dish>
    <dish id="2" price="9.95">Jumbo Prawns</dish>
    <dish id="3" price="10.95">Smoked Salmon and Avocado Quesadilla</dish>
    <dish id="4" price="6.95">Ceasar Salad</dish>
  </appetizers>
  <entrees title="Chow Time!">
    <dish id="5" price="19.95">Grilled Salmon</dish>
    <dish id="6" price="17.95">Seafood Pasta</dish>
    <dish id="7" price="16.125">Linguini al Pesto</dish>
    <dish id="8" price="18.95">Rack of Lamb</dish>
    <dish id="9" price="16.125">Ribs and Wings</dish>
  </entrees>
  <desserts title="To Top It Off">
    <dish id="10" price="6.95">Dame Blanche</dish>
    <dish id="11" price="5.95">Chocolat Mousse</dish>
    <dish id="12" price="6.95">Banana Split</dish>
  </desserts>
</menu>

Note

You can download the sample listings in this lesson from the publisher’s Web site.

Matching and Selecting Data

Let’s look first at match and select expressions without predicates that make comparisons. Basically, such an expression is a location path. The result of that location path depends on how specific the location path is. For instance, the expression /menu/entrees/dish[1]/text () is either empty or a string. The same goes for any location path that selects an attribute of a specific element in a source document—for instance, /menu/entrees/@title. Working with expressions starts to become tricky when more elements match your expression, such as /menu/entrees/dish/text () or /menu/*/@title. Although these expressions point to a text value, the location paths before the text () function and attribute selection match multiple nodes and therefore do not return a string, but a node-set.

For the second example, that result is more or less clear because it uses a wildcard. The first one is less obvious, however, because each element is spelled out completely. For a match expression, that isn’t such a problem. After all, you created a template to deal with nodes matching the expression, no matter how many. When you’re selecting data, it can be a problem, though, because you are expecting a string, but a node-set is returned.

Still, the expression will not fail, nor will the entire code. If a string is expected, either by a function or a comparison, the node-set is converted implicitly to a string. As discussed on Day 10, “Understanding Data Types,” this results in the value of the first node being used and converted to a string if necessary. This means that /menu/entrees/dish[1]/text () and /menu/entrees/dish/text () actually yield the same string result, even though the former results in a string and the second in a node-set. When you create expressions, you are bound to run into such implicit conversions.

Comparing Values

Implicit conversion isn’t all bad. In fact, it can be of great benefit because you can quickly check whether a value exists. So, especially in comparisons, implicit conversion can be quite useful. Whether you actually use implicit conversion in comparisons is a matter of choice. If you don’t want to run into any unforeseen trouble, converting every value explicitly is the better choice. The downside is that doing so probably decreases the performance of your stylesheet because implicit conversion is bound to be more efficient.

A great way to make use of implicit conversion is in test expressions of xsl:if and xsl:when elements. If you want to check whether some node or nodes exist, you can use the following code:

<xsl:if test="/menu/desserts/dish">

This test expression looks a little weird, but it actually tests to see whether there are any desserts. If the desserts element has no dish child elements, the node-set returned by the expression is empty. This is implicitly converted to a Boolean value, which is the data type the test attribute expects. When this expression yields an empty node-set, the converted value is false; otherwise, it is true. This means that if there are any desserts, the code block inside the xsl:if element is processed; otherwise, it is not.

Another useful feature of comparisons is that you can compare a simple value, such as a Boolean value, number, or string, to a node-set. If the node-set contains a node that has the value you require, the expression returns true. So, the comparison /menu/*/dish = ‘Ribs and Wings’ returns true for Listing 16.1.

Selecting Distinct Values

In documents where elements have multiple attributes, it is not uncommon that the values of some of those attributes are the same. Hence, a selection of such an attribute’s node-set would yield a node-set with nondistinct values because some of the values would be duplicated once or more in the node-set. Listing 16.2 should make this clearer.

LISTING 16.2 XML Source with Duplicate Manufacturers

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <cars>
3:    <model name="Golf" manufacturer="Volkswagen" year="1999" />
4:    <model name="Camry" manufacturer="Toyota" year="1999" />
5:    <model name="Focus" manufacturer="Ford" year="2000" />
6:    <model name="Civic" manufacturer="Honda" year="2000" />
7:    <model name="Prizm" manufacturer="Chevrolet" year="2000" />
8:    <model name="Celica" manufacturer="Toyota" year="2000" />
9:    <model name="Mustang" manufacturer="Ford" year="2001" />
10:   <model name="Passat" manufacturer="Volkswagen" year="2001" />
11:   <model name="Accord" manufacturer="Honda" year="2002" />
12:   <model name="Corvette" manufacturer="Chevrolet" year="2002" />
13: </cars>

ANALYSIS

In Listing 16.2, each model element is unique because the combination of all the attributes for each element is unique. If you look closely at all the manufacturer attributes, you’ll notice that each manufacturer is listed twice. This means that the expression /cars/model/@manufacturer yields a node-set with 10 nodes, but only 5 distinct manufacturers. So, if you list all the manufacturers, you get a list with 10 manufacturers, with each one duplicated once. If you want to list only the manufacturers, you probably don’t want those duplicated values, so you need a way to select only the distinct values of the manufacturer attribute. Listing 16.3 shows how you can select the distinct values.

LISTING 16.3 Stylesheet Selecting Distinct Manufacturers

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:
7:    <xsl:template match="/">
8:      <xsl:for-each select="//model[not (@manufacturer =
9:           preceding-sibling::model/@manufacturer)]/@manufacturer">
10:       <xsl:sort select="." />
11:       <xsl:value-of select="concat (.,'
')" />
12:     </xsl:for-each>
13:   </xsl:template>
14: </xsl:stylesheet>

ANALYSIS

Listing 16.3 is very simple. The template on line 7 matches the root of the source document. This template iterates through the manufacturers with the xsl:for-each element on line 8. For each manufacturer, the value is written to the output. Line 10 sorts the manufacturers before sending them to the output. The most interesting thing about Listing 16.3 is the select expression of the xsl:for-each element on line 8. This expression spans two lines. The expression selects all the model elements in the source document and uses a predicate to filter out elements with duplicate manufacturer attributes. After the filter is applied, the manufacturer attribute is selected for each node that passed the filter. The filter expression checks whether the value of the manufacturer attribute of the current element is equal to the value of a manufacturer attribute of any of the preceding model elements. If it is equal, the manufacturer has already been sent to the output, so the current one is a duplicate. The not () function makes sure that the predicate returns false so that the duplicate value isn’t sent to the output.

When line 5 of Listing 16.2 is evaluated, the value Ford is compared to the values Toyota from line 4 and Volkswagen from line 3. Both are different, so Ford is sent to the output. The Ford value on line 9 is compared to the values of the manufacturer attributes on lines 3–8. Because line 5 already has a value Ford, the expression @manufacturer = preceding-sibling::model/@manufacturer yields true, so the model element on line 9 is completely ignored. That Listing 16.3 yields only a unique list of manufacturers is shown in Listing 16.4.

OUTPUT

LISTING 16.4 Result from Applying Listing 16.3 to Listing 16.2

Chevrolet
Ford
Honda
Toyota
Volkswagen

Getting the duplicate values out of Listing 16.2 is a matter if using the right axis. You’ll find that when you have to compare values with values of other nodes, axes can do a lot that you can’t do with comparisons of values because the current context gets in the way.

Note

Familiarizing yourself again with axes is a good idea. Table 3.1 and Figure 3.5 in Day 3’s lesson show a reference that you can use when creating expressions that might benefit from using axes.

Working with Keys

In the preceding section, I reminded you that expressions can be quite long if you have to get to specific data. In addition, processors take longer to process a document if expressions are complex. Keys can help to simplify expressions and speed up processing.

What Is a Key?

A key is like an index in a book. In the index, you can look up a word. If that word exists in the book, the index tells you on which page or pages you can find it. A key is similar in that you can look up a value. If the value exists in the key index, you get a node-set containing the node or nodes that have the given value.

Before you can use an index in a book, it has to be compiled. The same is true for a key index in that it needs to be defined before you can use it. You have to define which nodes are to be indexed and the value or values that need to be used for the index. The nodes correspond to the pages in a book; the value or values, to the words found in the index. That you have to define the index is quite logical, because in a book you don’t want to index every single word, but just those words that relate to the subject matter. The value or values used to index the nodes can be the value of the nodes themselves, attribute values of the element’s attributes, or values of child elements.

When you have an index, you can look up a word and quickly go to the page containing that word. This approach is much quicker than having to check every page for the word you’re looking for, or having to go to a specific chapter and a specific section. The direct benefit of using keys is that you don’t have to write complex expressions to get to certain nodes. When you use keys, your expressions are likely to be shorter and easier to read. Whether you experience an actual performance benefit when processing is another matter. Some processors might choose to create in internal index while processing, which makes retrieval using keys perform better than using the expression that would be needed if the key didn’t exist. Other processors might translate the key to that expression and use it to retrieve the data instead. In that case, it is likely that the performance is not any better, but using keys only makes your life as a programmer easier.

If you’re using a specific processor, measuring the performance with and without keys is a good idea. If you measure performance, be sure to use source documents of different sizes. With a small source document, keys might actually be the cause for extra overhead, which doesn’t exist for large documents. The result also depends on how much use you make of a key. If you use it only once, using an expression is probably a better idea, because building an internal index takes time as well.

Using Keys to Select Data

You can define a key by using the xsl:key element. This is a top-level element, so it might occur only as a child element of the xsl:stylesheet element. This makes sense because the key needs to be defined before it ever gets used, so before any processing of elements starts. You can use the xsl:key element as follows:

<xsl:key name="carkey" match="car" use="@name" />

You can see that the xsl:key element has three attributes, all of which are mandatory. The name attribute gives the key a name, which is necessary because you can define more than one key in a stylesheet. Having multiple keys is like having multiple indexes in a book, such as one with words and one with people, as is sometimes the case in scientific books. The second attribute is match, which defines the nodes that need to be indexed. The value of the match attribute should be a pattern identifying the nodes that should be indexed. As in the preceding example, the value will be an element name in most cases, but it might also match several elements or attributes. The last attribute is use, which tells the processor what it should use to index the matching nodes. In the preceding example, it is an attribute value, but it also can be an expression of some sort. This means that you can use the element’s value, the value of one of the child elements, or a combination of values.

After you define a key, you can select elements by using the key () function, which returns a node-set with the elements that match the given key. You can use key () in an expression by itself or as part of a larger expression. Its use is as follows:

key ('carkey', 'Focus')

The two arguments of the key () function are mandatory. The first defines the key you want to use, and the second gives the value you’re looking for. Both are string values but can be created using an expression, so you can give dynamic values. Because this is the most common use for the value argument, the samples in this section make use of it. Listing 16.5 shows the source document used for the coming samples.

LISTING 16.5 Sample XML with Cars and Manufacturers

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <car:cars xmlns:car="http://www.example.com/xmlns/car"
3:            xmlns:m="http://www.example.com/xmlns/manufacturer">
4:    <car:models>
5:      <car:model car:name="Golf" m:id="VW" car:year="1999" />
6:      <car:model car:name="Camry" m:id="TY" car:year="1999" />
7:      <car:model car:name="Focus" m:id="FO" car:year="2000" />
8:      <car:model car:name="Civic" m:id="HO" car:year="2000" />
9:      <car:model car:name="Prizm" m:id="CV" car:year="2000" />
10:     <car:model car:name="Celica" m:id="TY" car:year="2000" />
11:     <car:model car:name="Mustang" m:id="FO" car:year="2001" />
12:     <car:model car:name="Passat" m:id="VW" car:year="2001" />
13:     <car:model car:name="Accord" m:id="HO" car:year="2002" />
14:     <car:model car:name="Corvette" m:id="CV" car:year="2002" />
15:   </car:models>
16:   <m:manufacturers>
17:     <m:manufacturer m:id="VW" m:name="Volkswagen" m:country="Germany" />
18:     <m:manufacturer m:id="TY" m:name="Toyota" m:country="Japan" />
19:     <m:manufacturer m:id="FO" m:name="Ford" m:country="USA" />
20:     <m:manufacturer m:id="CV" m:name="Chevrolet" m:country="USA" />
21:     <m:manufacturer m:id="HO" m:name="Honda" m:country="Japan" />
22:   </m:manufacturers>
23: </car:cars>

ANALYSIS

Listing 16.5 is similar to a listing for yesterday’s lesson and familiar for the most part. Line 2 defines the root element, which uses the car namespace and declares the namespace. In addition, on line 3, the m namespace is declared, denoting manufacturer information. Each car:model element uses attributes with a namespace. The same goes for the m:manufacturer elements.

In some of the preceding lessons, you learned different ways to combine the car and manufacturer data with data similar to Listing 16.5. Listing 16.6 shows a stylesheet that combines data using a key.

LISTING 16.6 Stylesheet Using a Key

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
4:    xmlns:car="http://www.example.com/xmlns/car"
5:    xmlns:m="http://www.example.com/xmlns/manufacturer">
6:
7:    <xsl:output method="text" encoding="UTF-8" />
8:    <xsl:key name="mfc" match="m:manufacturer" use="@m:id" />
9:
10:   <xsl:template match="/">
11:     <xsl:apply-templates select="/car:cars/car:models" />
12:   </xsl:template>
13:
14:   <xsl:template match="car:models">
15:     <xsl:for-each select="car:model">
16:       <xsl:value-of select="key ('mfc', @m:id)/@m:name" />
17:       <xsl:text> </xsl:text>
18:       <xsl:value-of select="@car:name" />
19:       <xsl:text> (</xsl:text>
20:       <xsl:value-of select="@car:year" />
21:       <xsl:text>)
</xsl:text>
22:     </xsl:for-each>
23:   </xsl:template>
24: </xsl:stylesheet>

ANALYSIS

On lines 3 and 4, Listing 16.6 declares the namespaces used in Listing 16.5. Line 8 uses these namespaces when it defines a key named mfc on the m:manufacturer elements. Because this pattern just gives an element name, the elements indexed might occur anywhere in the source document. They don’t need to occur at a specific depth or as child elements of some specific element. The key is defined using the m:id attribute, which means that you select elements based on the value of the m:id attribute. The key can be combined from different nodes, in which case you need to search on the value of the combined nodes. The template on line 10 matching the root element invokes other templates starting at the car:models element. This means that the m:manufacturers element and its child elements are never processed by a template. The template on line 14 matches the car:models element and then uses xsl:for-each on line 15 to iterate through all the car:model elements. Line 18 outputs the name of the car model, and line 20 outputs the year. The lines in between are just for nice formatting. Line 16 uses the key () function to select the m:manufacturer element that has an m:id attribute with the same value of the m:id attribute of the car:model element being processed. The key ()function is part of a larger expression that immediately outputs the name of the manufacturer. Another option would have been to capture the result of the key in a variable and address it separately, but the idea of a key is that you have quick access, so you don’t need variables. Listing 16.7 shows the result when this stylesheet is applied to Listing 16.5.

OUTPUT

LISTING 16.7 Result from Applying Listing 16.6 to Listing 16.5

Volkswagen Golf (1999)
Toyota Camry (1999)
Ford Focus (2000)
Honda Civic (2000)
Chevrolet Prizm (2000)
Toyota Celica (2000)
Ford Mustang (2001)
Volkswagen Passat (2001)
Honda Accord (2002)
Chevrolet Corvette (2002)

ANALYSIS

A quick look at Listing 16.7 shows you that each car name is preceded by the manufacturer name. This name was grabbed from another section of the document, similar to another table in a database. Using keys, as such, is a great way to use related data.

From the preceding example, you might think that the value on which a key is based needs to be unique. That is by no means true. If multiple elements have the same key value, the key ()function will return a node-set with those nodes instead of just one node. You can use that node-set just like any other node-set. Listing 16.8 shows a stylesheet with such a nonunique key.

LISTING 16.8 Stylesheet with Nonunique Key

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”" exclude-result-prefixes="car m"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
4:    xmlns:car="http://www.example.com/xmlns/car"
5:    xmlns:m="http://www.example.com/xmlns/manufacturer">
6:
7:    <xsl:output method="html" encoding="UTF-8" />
8:    <xsl:key name="cars" match="car:model" use="@m:id" />
9:
10:   <xsl:template match="/">
11:     <html>
12:     <body>
13:       <h1>Auto show</h1>
14:       <xsl:apply-templates select="/car:cars/m:manufacturers" />
15:     </body>
16:     </html>
17:   </xsl:template>
18:
19:   <xsl:template match="m:manufacturers">
20:     <xsl:for-each select="m:manufacturer">
21:       <h2><xsl:value-of select="@m:name" /></h2>
22:       <p><i>Country: <xsl:value-of select="@m:country" /></i></p>
23:       <xsl:for-each select="key ('cars', @m:id)">
24:         <ul>
25:           <li>
26:             <xsl:value-of select="@car:name" />
27:             <xsl:text> (</xsl:text>
28:             <xsl:value-of select="@car:year" />
29:             <xsl:text>)</xsl:text>
30:           </li>
31:         </ul>
32:       </xsl:for-each>
33:     </xsl:for-each>
34:   </xsl:template>
35: </xsl:stylesheet>

ANALYSIS

The first 7 lines of Listing 16.8 are the same as Listing 16.6, except that line 7 defines the output as HTML instead of text. Because the output is HTML, line 2 also tells the processor which namespaces to exclude from the output. Line 8 defines a key on the car:model element, using m:id. The value of the m:id attribute is not unique for each of the car:model elements. In fact, for each value of the attribute, there are two car:model elements in Listing 16.5. The template on line 10 creates the HTML base code and on line 14 selects the m:manufacturers element and invokes the processor to match templates. The template on line 19 matches that element and iterates through all m:manufacturer elements in Listing 16.5. Line 23 uses the key () function to select any car:model elements that have the same value for their m:id attribute as the m:id attribute of the current m:manufacturer element. An xsl:for-each element iterates through these elements.

Listing 16.8 is similar to several examples from previous lessons. In fact, the output shown in Listing 16.9 is exactly the same as for those examples. Those examples used the current () function or a variable to circumvent the problem that the current node is out of context when the xsl:for-each element on line 23 uses a location path. With the key () function in this example, that problem doesn’t happen, so the value used to select the car:model elements can be taken directly from the context node.

OUTPUT

LISTING 16.9 Result from Applying Listing 16.8 to Listing 16.5

<html>
   <body>
      <h1>Auto show</h1>
      <h2>Volkswagen</h2>
      <p><i>Country: Germany</i></p>
      <ul>
         <li>Golf (1999)</li>
      </ul>
      <ul>
         <li>Passat (2001)</li>
      </ul>
      <h2>Toyota</h2>
      <p><i>Country: Japan</i></p>
      <ul>
         <li>Camry (1999)</li>
      </ul>
      <ul>
         <li>Celica (2000)</li>
      </ul>
      <h2>Ford</h2>
      <p><i>Country: USA</i></p>
      <ul>
         <li>Focus (2000)</li>
      </ul>
      <ul>
         <li>Mustang (2001)</li>
      </ul>
      <h2>Chevrolet</h2>
      <p><i>Country: USA</i></p>
      <ul>
         <li>Prizm (2000)</li>
      </ul>
      <ul>
         <li>Corvette (2002)</li>
      </ul>
      <h2>Honda</h2>
      <p><i>Country: Japan</i></p>
      <ul>
         <li>Civic (2000)</li>
      </ul>
      <ul>
         <li>Accord (2002)</li>
      </ul>
   </body>
</html>

One of the great things about keys is that you can define a key for different elements. If you use the key () function with such a key, any element that matches is returned. Listing 16.10 shows an XML sample that can benefit from this functionality.

LISTING 16.10 Sample XML with Movies

<?xml version=″“1.0”" encoding="UTF-8"?>
<movies>
  <movie title="The Good, the Bad, and the Ugly" genre="western">
    <director name="Sergio Leone" />
    <actor name="Clint Eastwood" character="Biondo" />
    <actor name="Lee Van Cleef" character="Angel Eyes Sentenza" />
    <actor name="Eli Wallach" character="Tuco Ramirez" />
  </movie>
  <movie title="The Piano" genre="drama">
    <director name="Jane Campion" />
    <actor name="Holly Hunter" character="Ada McGrath" />
    <actor name="Harvey Keitel" character="George Baines" />
  </movie>
  <movie title="Bird" genre="drama">
    <director name="Clint Eastwood" />
    <actor name="Forest Whitaker" character="Charlie 'Bird' Parker" />
  </movie>
  <movie title="Stir Crazy" genre="comedy">
    <director name="Sidney Poitier" />
    <actor name="Gene Wilder" character="Skip Donahue" />
    <actor name="Richard Pryor" character=" Harry Monroe" />
  </movie>
</movies>

ANALYSIS

Listing 16.10 contains several movie elements with information about a movie. Each movie element has a title and genre attribute and several actor and director child elements. These elements have at least a name attribute with the name of the actor or director. Listing 16.11 shows a stylesheet that selects data with a key matching multiple elements.

LISTING 16.11 Stylesheet with Key Matching Different Elements

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:    <xsl:key name="searchkey" match="actor" use="@name" />
7:    <xsl:key name="searchkey" match="director" use="@name" />
8:    <xsl:param name="search" />
9:
10:   <xsl:template match="/">
11:     <xsl:for-each select="key ('searchkey',$search)">
12:       <xsl:value-of select="@name" />
13:       <xsl:text>
-</xsl:text>
14:       <xsl:value-of select="name ()" />
15:       <xsl:text>: </xsl:text>
16:       <xsl:value-of select="../@title" />
17:       <xsl:text>
</xsl:text>
18:     </xsl:for-each>
19:   </xsl:template>
20: </xsl:stylesheet>

ANALYSIS

On lines 6 and 7 in Listing 16.11, the xsl:key element is used twice for the same key name. The match attribute for the two elements is different. The effect is that the searchkey key applies to both the actor and director elements. A parameter defined on line 8 is later used as the value to search for in the key. On line 11, the key () function selects actor and director elements that have a name attribute identical to the value of the search parameter. The body of the xsl:for-each element on line 11 outputs that name on line 12 and whether it is an actor or director on line 14. Line 16 goes up a level to the movie element and outputs the title of the movie. The rest of the body is just there for formatting. When Listing 16.11 is applied to Listing 16.10 and the parameter passed has the value Clint Eastwood, Listing 16.12 is the output.

OUTPUT

LISTING 16.12 Result from Applying Listing 16.11 to Listing 16.10

Clint Eastwood
-actor: The Good, the Bad, and the Ugly
Clint Eastwood
-director: Bird

ANALYSIS

The key () function in Listing 16.11 returns Clint Eastwood as actor in one movie and director of another. This was the whole idea of using the combined key. Each element is used with its surrounding elements to get the output in Listing 16.12.

In Listing 16.11, the key is defined with two xsl:key elements. Because the use attribute has the same value for both, you can replace those two lines with one xsl:key element, which would look like this:

<xsl:key name="searchkey" match="actor|director" use="@name" />

The match attribute in the preceding sample uses a union pattern to combine two elements for the key. Such a pattern can be more complex, of course. You need two separate xsl:key elements if the use attribute must have a different value. This would be the case if you were searching for movies and the director element was an attribute of the movie element. In that case, the key should be defined as follows:

<xsl:key name="searchkey" match="movie" use="actor/@name" />
<xsl:key name="searchkey" match="movie" use="@director" />

Caution

You can’t define a key dynamically using variables and parameters.

Working with Unique IDs

Document Type Definitions (DTDs) enable you to define an attribute of an element as an ID attribute. This type of attribute has to be unique for that element across the entire XML document, so the attribute is a unique identifier for that element. This property is useful to track down elements in a way similar to when you use keys. Keys are much more flexible, but if you work with documents that already use a DTD that defines an ID attribute, why not use it? Listing 16.13 shows an XML document with an internal DTD.

LISTING 16.13 XML Document with an Internal DTD

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <!DOCTYPE manufacturers [
3:    <!ELEMENT manufacturers (manufacturer)*>
4:    <!ELEMENT manufacturer EMPTY>
5:    <!ATTLIST manufacturer
6:      id      ID      #REQUIRED
7:      name     CDATA  #REQUIRED
8:      country CDATA  #REQUIRED>
9:  ]>
10: <manufacturers>
11:   <manufacturer id="VW" name="Volkswagen" country="Germany" />
12:   <manufacturer id="TY" name="Toyota" country="Japan" />
13:   <manufacturer id="FO" name="Ford" country="USA" />
14:   <manufacturer id="CV" name="Chevrolet" country="USA" />
15:   <manufacturer id="HO" name="Honda" country="Japan" />
16: </manufacturers>

Note

Discussing DTDs in detail is beyond the scope of this book. Apart from ID attributes, DTDs are not likely to have any effect on how you create a stylesheet.

ANALYSIS

Lines 2 through 9 in Listing 16.13 define an internal DTD that is used to validate the XML when it is loaded by a validating parser. Line 3 tells the parser that the document might have a manufacturers element that can contain zero or more manufacturer elements. Line 4 defines the manufacturer element as being empty, meaning that it cannot have any content, except for attributes. Lines 5 through 8 define the attributes for the manufacturer attribute. The definitions correspond to the attributes that these elements have in the XML document. For each attribute, the data type is defined. In the case of the name and country attributes, this is character data (string), which is denoted by CDATA. The definition of the id attribute on line 6 tells the parser that this attribute is of type ID, which means that the parser has to make sure that the values of this attribute are unique for each element throughout the document. Each attribute is defined as REQUIRED, which means that the element must have this attribute, or it is invalid.

In the preceding example, the attribute that is defined of type ID is named id. It is a misconception that an ID type attribute must have the name id or ID. The requirement is that the values are unique; you can name the attribute any way you want. The following defines an attribute named ssn as an ID attribute:

<!ATTLIST Person
ssn ID #REQUIRED>

Selecting Data with a Unique ID

An element that has an ID type attribute can be selected using the id () function. This function resembles the key () function, but it has only one argument—the value of the attribute you’re looking for. The id () function returns the corresponding element for which the ID type attribute is defined. Listing 16.14 shows this function in action.

LISTING 16.14 Stylesheet Selecting on a Unique ID

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:
7:    <xsl:template match="/">
8:      <xsl:value-of select="id ('FO')/@name" />
9:    </xsl:template>
10: </xsl:stylesheet>

ANALYSIS

Listing 16.14 is simple. It has only one template, and on line 8, it outputs one value—the value of the name attribute of the manufacturer element that has an id attribute with the value FO. Listing 16.15 shows the result when Listing 16.14 is applied to Listing 16.13.

OUTPUT

LISTING 16.15 Result from Applying Listing 16.14 to Listing 16.13

Ford

ANALYSIS

Listing 16.15 is not much to look at because there is only one element in Listing 16.13 for which the id attribute has the value FO. This element has a name attribute with the value Ford. So, you can see that the id () function returns the node corresponding to the given ID value.

IDs, IDREFS, and XML Schema

Closely related to the ID type attributes in a DTD are IDREFS, which are references to ID type attributes defining a relationship between different elements. XSLT supports only ID type attributes and has no support for IDREFS.

XML Schema uses a different mechanism entirely for this functionality. This mechanism closely resembles keys in XSLT. XML Schema is not yet supported by XSLT, so unfortunately it isn’t of much use to XSLT developers at this time. XML Schema support will not be available in XSLT before version 2.0.

Inserting Unique IDs

Suppose you want to create an XML document that will be validated by an external DTD. In that case, you need to have some way of making sure that the attributes that are designated as ID attributes in the DTD contain a unique value. You can achieve this by creating the values for these attributes with the generate-id () function, which returns a string with a unique identifier. You can use this function without an argument, in which case it will generate a unique identifier based on the context node. You also can pass a parameter that points to a certain node using a pattern. The key is then generated based on that node. If the argument is a node-set rather than a node, the result is based on the first node in the node-set. The other nodes are ignored. The idea here is that, for each node in the document, the generate-id () function returns a unique value. If you use the generate-id () function later in a document for the same node, the result is the same. This feature is useful when you create a document in which elements reference each other—for instance, when you create an HTML document with links to specific places in a document. Because this approach is actually much more interesting than creating an XML document with an ID type attribute, the next example will focus on it. In the exercise at the end of this lesson, you will create an XML document with unique identifiers. Listing 16.16 shows a stylesheet that creates an HTML document with referring links based on the generated ID value of a node.

LISTING 16.16 Stylesheet Generating IDs Linking to Other Elements

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”" exclude-result-prefixes="car m"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
4:    xmlns:car="http://www.example.com/xmlns/car"
5:    xmlns:m="http://www.example.com/xmlns/manufacturer">
6:
7:    <xsl:output method="html" encoding="UTF-8" />
8:    <xsl:strip-space elements="*" />
9:    <xsl:key name="mfc" match="m:manufacturer" use="@m:id" />
10:
11:   <xsl:template match="/">
12:     <html>
13:     <body>
14:       <xsl:apply-templates select="/car:cars/car:models" />
15:       <hr />
16:       <xsl:apply-templates select="/car:cars/m:manufacturers" />
17:     </body>
18:     </html>
19:   </xsl:template>
20:
21:   <xsl:template match="car:models">
22:     <h2>Car Models</h2>
23:     <ul>
24:       <xsl:for-each select="car:model">
25:         <li>
26:           <a href="#{generate-id (key ('mfc', @m:id))}">
27:             <xsl:value-of select="key ('mfc', @m:id)/@m:name" />
28:           </a>
29:           <xsl:text> </xsl:text>
30:           <xsl:value-of select="@car:name" />
31:           <xsl:value-of select="concat (' (',@car:year,')')" />
32:         </li>
33:       </xsl:for-each>
34:     </ul>
35:   </xsl:template>
36:
37:   <xsl:template match="m:manufacturers">
38:     <h2>Manufacturers</h2>
39:     <ul>
40:       <xsl:apply-templates />
41:     </ul>
42:   </xsl:template>
43:
44:   <xsl:template match="m:manufacturer">
45:     <li>
46:       <a name="{generate-id ()}">
47:         <xsl:value-of select="@m:name" />
48:       </a>
49:         <xsl:value-of select="concat (' (',@m:country,')')" />
50:     </li>
51:   </xsl:template>
52: </xsl:stylesheet>

ANALYSIS

Listing 16.16 uses a mix of techniques. Because this stylesheet is used with Listing 16.5, it needs to declare the same namespaces, which it does on lines 4 and 5. Line 9 defines a key on the m:manufacturer elements for quick access. The root template on line 11 creates the HTML base code and uses xsl:apply-templates twice: on line 14 for the cars and on line 16 for the manufacturers. The former matches the template on line 21, which creates a header and iterates through the car:model elements with line 24. Line 26 creates a link using the generate-id () function. The key () function provides that function with the element the ID is generated for, which is the m:manufacturer element belonging to the car:model element being processed. The same key is used on line 27 to get the name of the manufacturer. The rest of that template outputs the values of the current car:model element plus some formatting. The manufacturers are processed by two templates: The template on line 37 just takes care of the outer shell, and the template on line 44 actually outputs the values and the formatting for each manufacturer. Line 46 creates an HTML anchor that the link discussed earlier links to. To accomplish this, the anchor needs to get the same ID value. The generate-id () function creates an ID for the context node, which is an m:manufacturer node, so the ID value is, in fact, the same as that created on line 26. The rest of the template is again mostly values and formatting. Listing 16.17 shows the result.

OUTPUT

LISTING 16.17 Result from Applying Listing 16.16 to Listing 16.5

1:  <html>
2:     <body>
3:        <h2>Car Models</h2>
4:        <ul>
5:           <li><a href="#d1e14">Volkswagen</a> Golf (1999)
6:           </li>
7:           <li><a href="#d1e15">Toyota</a> Camry (1999)
8:           </li>
9:           <li><a href="#d1e16">Ford</a> Focus (2000)
10:          </li>
11:          <li><a href="#d1e18">Honda</a> Civic (2000)
12:          </li>
13:          <li><a href="#d1e17">Chevrolet</a> Prizm (2000)
14:          </li>
15:          <li><a href="#d1e15">Toyota</a> Celica (2000)
16:          </li>
17:          <li><a href="#d1e16">Ford</a> Mustang (2001)
18:          </li>
19:          <li><a href="#d1e14">Volkswagen</a> Passat (2001)
20:          </li>
21:          <li><a href="#d1e18">Honda</a> Accord (2002)
22:          </li>
23:          <li><a href="#d1e17">Chevrolet</a> Corvette (2002)
24:          </li>
25:       </ul>
26:       <hr>
27:       <h2>Manufacturers</h2>
28:       <ul>
29:          <li><a name="d1e14">Volkswagen</a> (Germany)
30:          </li>
31:          <li><a name="d1e15">Toyota</a> (Japan)
32:          </li>
33:          <li><a name="d1e16">Ford</a> (USA)
34:          </li>
35:          <li><a name="d1e17">Chevrolet</a> (USA)
36:          </li>
37:          <li><a name="d1e18">Honda</a> (Japan)
38:          </li>
39:       </ul>
40:    </body>
41:  </html>

ANALYSIS

If all is well, Listing 16.17 should contain links from each car to its manufacturer. The links are based on ID values created with the generate-id () function. So, for the cars from the same manufacturer, the ID value should be the same and link to the manufacturer with that ID. For the Volkswagen cars, this means that the value of the href attributes on lines 5 and 19 should match the value of the name attribute on line 29. As you can see, that is actually the case. If you check the other cars and manufacturers, you will find that the value is correct for each of them, as it should be.

Using Keys and Generated IDs to Select Distinct Values

Earlier in this lesson, you learned how to select only the distinct values in a node-set with values that might have duplicate values. You learned that you can use the preceding-sibling axis to accomplish this task, but that with large node-sets, this might not perform well. Providing a processor uses an internal index when creating a key, keys can be used to better the performance with large node-sets. Listing 16.18 shows you how this works.

LISTING 16.18 Stylesheet Selecting Distinct Values Based on a Key

1:  <?xml version=″“1.0”" encoding="UTF-8"?>
2:  <xsl:stylesheet version=″“1.0”"
3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4:
5:    <xsl:output method="text" encoding="UTF-8" />
6:
7:    <xsl:key name="distinct" match="model" use="@manufacturer" />
8:
9:    <xsl:template match="/">
10:     <xsl:for-each select="//model[generate-id () =
11:          generate-id (key ('distinct', @manufacturer)[1])]/@manufacturer">
12:       <xsl:sort select="." />
13:       <xsl:value-of select="concat (.,'#xA;')" />
14:     </xsl:for-each>
15:   </xsl:template>
16: </xsl:stylesheet>

ANALYSIS

Listing 16.18 is the same as Listing 16.3 for the most part. Line 7 defines a key for all the model elements. The key is generated based on the value of the manufacturer attribute, which in Listing 16.2 is not unique for each model element. Line 10 contains the expression selecting the distinct values, which is totally different from that used on lines 8 and 9 in Listing 16.3. The predicate that makes sure you get distinct values uses the key () and generate-id () functions to filter out the duplicate nodes. The expression is based on the fact that the generate-id () function always creates the same ID value for the same node. The key () function selects all the model elements that have the same value for the manufacturer attribute. Only the first of those should be sent to the output, which means that the generated ID value of the current model element should be the same as that of the first node in the node-set returned by the key () function. Listing 16.18 will yield the exact same result as Listing 16.3. Listing 16.18 is likely to perform better with large node-sets.

There is another alternative for the predicate expression in Listing 16.18. If you select the current node and the first node returned by the key () function, they should be the same. This means that a union of these two elements would actually be one element, so when you count the number of nodes in the node-set, the result should be 1. This is expressed by the following predicate:

count (.|key ('distinct', @manufacturer)[1]) = 1

This expression counts the number of nodes in the union (expressed by the pipe symbol) of the current node and the first in the node-set returned by the key () function.

Summary

Today you learned about some intricate features of expressions. These features can give you a hard time because they can be the cause for results other than you expected. These problems are related to implicit type conversion 99 out of 100 times. It is imperative that you learn these rules well, so rereading the lesson from Day 10 wouldn’t be a bad idea.

You also learned that you can use keys to retrieve data easily, in the process probably increasing the performance of your stylesheet. Keys allow you a quick access method to those elements for which you have defined a key. You can use nodes of the same name and type for the key, but you also can use nodes of different types and names. In addition, these nodes can be scattered throughout a document, depending on the match expression you use to create the index.

A last method to quickly retrieve elements is tied into the use of ID type attributes defined in a DTD. These attributes must have a unique value across a document, so they can be used by the processor to retrieve elements based on that value, no matter where the location of the element. Because such an attribute needs to be defined in a DTD, this functionality is useful only in valid XML documents that have ID type attributes defined.

In tomorrow’s lesson, you’ll learn about recursion, which is, among other things, important for computational stylesheets. Recursion is an important mechanism to get around some of the problems with variables.

Q&A

Q Is there a limit to the number of keys I can create?

A No. You can use as many keys as you want. You should, however, use keys only if it really makes sense from a performance point of view or if using them greatly simplifies your expressions. I suggest testing the key performance of your processor of choice before committing to an implementation using keys if performance is critical.

Q Can I search for more than one ID value at a time?

A Yes. The argument of the id () function can actually be a whitespace-separated list of ID values searched for. All elements that match are returned as a node-set.

Q Why are IDREFS not supported in XSLT?

A Unless you have access to the DTD, IDREFS don’t really make much sense in XSLT because you can’t know which values are actually IDREFS. If you do know, you know at design time, so you can use that information when creating the stylesheet.

Workshop

This workshop tests whether you understand all the concepts you learned today. It is helpful to know and understand the answers before starting tomorrow’s lesson. You can find the answers to the quiz questions and exercises in Appendix A.

Quiz

1. True or False: Values used as a key need to be unique.

2. True or False: The generate-id () function always creates the same value if used on the same node in a document.

3. If you compare a string or number value to a node-set, when will the comparison say that the values are equal?

4. Can you benefit from storing the result of the key () function in a variable?

5. Why can the value of an ID type attribute be used to retrieve data quickly?

Exercise

1. Create a stylesheet for Listing 16.13. This stylesheet should re-create this document (excluding the DTD), but with the values of id attribute created with generate-id (). The original value might be discarded or put in another attribute named code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16 Advanced Data Selection

Create new playlist

Sign In

Sign Up

WEEK 1 Day 16 Advanced Data Selection

More About Expressions

Matching and Selecting Data

Comparing Values

Selecting Distinct Values

Working with Keys

What Is a Key?

Using Keys to Select Data

Working with Unique IDs

Selecting Data with a Unique ID

Inserting Unique IDs

Using Keys and Generated IDs to Select Distinct Values

Summary

Q&A

Workshop

Quiz

Exercise

Table of Contents for
16 Advanced Data Selection

WEEK 1 Day 16
Advanced Data Selection