WEEK 1 Day 21
Designing XML and XSLT Applications

In the preceding lessons, you learned about all the individual pieces of the XSLT language. On Day 19, “Working with XSLT Extensions,” you even learned how to extend the XSLT language, and in yesterday’s lesson, you learned how to deal with the differences between processors.

Today you will learn more about the bigger picture when creating XML and XSLT applications. Instead of discussing individual elements or functions, this lesson looks at design considerations when you’re creating XML and XSLT applications. This does not mean that this lesson will give you the ultimate way to design applications. The discussion centers around the questions that will face you each time you have to create an application and how to answer these questions on an application-by-application basis.

In today’s lesson, you will learn the following:

• How to decide whether your XML should contain elements or attributes for certain values

• Which options you have when defining the hierarchy of your XML

• How to decide between matching and selection

• When to use variables or keys

Designing XML

XML is a versatile data storage format. For most data, you have a multitude of options on how to set up the data structure. The choices you make will have a lasting effect on your application, so you should design your XML structures with care.

Important aspects to the design of your XML structures for an application are the application domain and use of data. If the data will be displayed without much change to the data itself, using a format that closely resembles the formatted output makes creating your stylesheets much easier because you don’t have to shuffle data around. On the other hand, if the data needs to be processed and altered before being displayed, you will want to create a format that is most suited for manipulating the data, without resorting to complex expressions or variables.

Closely linked to the application domain is the way XML is used in an application. Using it for storage and display is one thing, but it might also be used to transmit data between sections of an application. For instance, if your application is an order system, data from an order must be passed on to invoicing, shipping, and so on. Before you start to work on the XML design itself, you should have a design of the application itself—for example, in Unified Modeling Language (UML). Specifically, sequence diagrams that show what happens on certain actions and state diagrams that show you the state of the application under certain conditions are useful. Such diagrams show you where XML might be needed and which data should be stored in XML or transmitted with XML.

The problem with design is that deciding which options you should go for is not always clear-cut. One reason you store data in an XML document in the first place is that you can create different stylesheets to create different output. This means that if you design the structure of your XML document from the point of view of one of the output formats, creating a stylesheet for the other format might be a tall order. You therefore need to consider what the common denominators are and design from there. This is a pure methodological breakdown of your application. You can start by creating sample output for each output you need to create from an XML source. Chances are you’ll see structures that are similar in each case. Those structures are likely to have a similar structure in your XML structure.

One important point to remember is that you can restructure your XML documents. So, you can create a first draft of your design that holds all the (sample) data. While you’re creating the draft, you will likely see that some things work well and some things don’t. From the draft, you can create sections of the different output that you’ll need for your application. This way, you can quickly see where you will encounter problems if you keep the structure as it is. As long as you stick to templates as much as possible, changes to the data don’t affect your stylesheets very much. In many cases, you can alter a template just slightly so that it will create the same output but from data that is structured differently.

Consider the following structure:

<name firstname=”Michiel” lastname=”van Otegem” />

A template processing this structure might look like this:

<xsl:template match="name">
    <xsl:value-of select="@lastname" />
    <xsl:text>, </xsl:text>
    <xsl:value-of select="@firstname" />
</xsl:template>

Later, you discover that people can have more than one first name, so you change the XML to

<name>
    <firstname>Michiel</firstname>
    <lastname>van Otegem</lastname>
</name>

Changing the template accordingly is not a big problem; you just need to remove the attribute markers from the data selection so that it’ll look as follows:

<xsl:template match="name">
    <xsl:value-of select="lastname" />
    <xsl:text>, </xsl:text>
    <xsl:value-of select="firstname[1]" />
</xsl:template>

Because of the way XSLT selects data, this template produces the same output as the former, with the corrected XML source. Even if the changes to the template aren’t trivial, the other templates in the stylesheet will stay the same, for the most part, because each template acts as an independent unit. Once a unit works well and the XML structure for that unit is final, you can use it indefinitely, even across applications.

During the design phase, you can look at XML as moldable. The data can take the shape you want it to, and if it doesn’t work, you can reshape it. You can do this directly with an XML document itself, which is unlike most other data storage formats. For instance, you cannot create and change database tables and relationships quickly. Database design is therefore often done on paper or with design tools. Before you take any action on the database itself, the design must be ready. This is not so for XML, which you can change easily, even during the implementation phase if you encounter a problem. That doesn’t mean, of course, that it isn’t a good idea to use data modeling tools and techniques.

They have been around for years and are a product of experience. That said, most design techniques and tools are still aimed at the more rigid data formats around, so in the end, you can perform tasks with XML that go beyond what those tools and techniques allow you to do. So, although they are a good starting point, using XML itself during the design phase can help you iron out implementation-type problems before you actually start implementing an application. In fact, the design and implementation phases with XML applications are much closer to each other because the data design is the same as the data format, whereas a database design is nowhere near the actual format.

XML Design Considerations

When you design XML, you always need to make some key considerations. The hierarchy you use is, of course, very important, but also the choice between elements and attributes, and so on. Although there is no definitive answer to the question “What is better?” the following sections will help you decide.

Setting Up a Hierarchy

The hierarchy you use in a document is very important, specifically how you select data using a stylesheet. If two (or more) sets of data are related to one another, how do you define their relationship? One way is to have values that reference each other, as shown in Listing 21.1.

LISTING 21.1 XML Document with Car-Manufacturer Relationship

     <?xml version=″“1.0”" encoding="UTF-8"?>
     <cars>
       <models>
         <model name="Golf" manufacturer="VW" year="1999" />
         <model name="Camry" manufacturer="TY" year="1999" />
         <model name="Focus" manufacturer="FO" year="2000" />
         <model name="Civic" manufacturer="HO" year="2000" />
         <model name="Prizm" manufacturer="CV" year="2000" />
         <model name="Celica" manufacturer="TY" year="2000" />
         <model name="Mustang" manufacturer="FO" year="2001" />
         <model name="Passat" manufacturer="VW" year="2001" />
         <model name="Accord" manufacturer="HO" year="2002" />
         <model name="Corvette" manufacturer="CV" year="2002" />
       </models>
       <manufacturers>
         <manufacturer id="VW" name="Volkswagen" country="Germany" />
         <manufacturer id="TY" name="Toyota" country="Japan" />
         <manufacturer id="FO" name="Ford" country="USA" />
         <manufacturer id="CV" name="Chevrolet" country="USA" />
         <manufacturer id="HO" name="Honda" country="Japan" />
       </manufacturers>
     </cars>

ANALYSIS

In Listing 21.1, the cars and manufacturers are related. Their relationship is defined by the manufacturer attribute of the model elements. That attribute’s value corresponds to the id attribute of the manufacturer elements. When you want to select data concerning a car’s manufacturer, you need to use a predicate expression such as

/cars/manufacturers/manufacturer[@id = current ()/@manufacturer]/@name

The other way around you are faced with the same problem. The preceding expression is less than delightful. Not only does it use a predicate expression to get to the data, but it also relies on absolute addressing to get to the data. If you were to make the data in Listing 21.1 part of a larger structure, the absolute addressing might change, making the expression useless. If you want to make a list with manufacturers and their cars, your stylesheet would look like Listing 21.2.

LISTING 21.2 Stylesheet Creating a List of Manufacturers and Cars from Listing 21.1

      1:  <?xml version=″“1.0”" encoding="UTF-8"?>
      2:  <xsl:stylesheet version=″“1.0”"
      3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      4:
      5:    <xsl:output method="text" encoding="UTF-8" />
      6:
      7:    <xsl:template match="/">
      8:      <xsl:apply-templates select="/cars/manufacturers" />
      9:    </xsl:template>
      10:
      11:   <xsl:template match="manufacturers">
      12:     <xsl:for-each select="manufacturer">
      13:       <xsl:value-of select="@name" />
      14:       <xsl:value-of select="concat ('  (',@country,')&#xA;')" />
      15:        <xsl:for-each
      16:            select="/cars/models/model[@manufacturer = current ()/@id]">
      17:         <xsl:value-of select="concat ('-',@name,'  (',@year,')&#xA;')" />
      18:       </xsl:for-each>
      19:       <xsl:text>&#xA;</xsl:text>
      20:      </xsl:for-each>
      21:   </xsl:template>
      22: </xsl:stylesheet>

ANALYSIS

Listing 21.2 creates a simple list of manufacturers and cars. To let the processor start processing the manufacturers instead of the cars, the xsl:apply-templates element on line 8 selects that data for matching. If that element were omitted, the processor would match the models and model elements without doing something with the data, which just costs processing cycles. The template on line 11 matches the manufacturers element and does all the processing using iteration. This template can hardly do without iteration, especially because of line 16, which selects the cars to be iterated. This expression used with a match template would require additional selection of the elements before invoking other templates as well. The result is as shown in Listing 21.3

OUTPUT

LISTING 21.3 Result from Applying Listing 21.2 to Listing 21.1

      Volkswagen  (Germany)
      -Golf  (1999)
      -Passat  (2001)

      Toyota  (Japan)
      -Camry  (1999)
      -Celica  (2000)

      Ford  (USA)
      -Focus  (2000)
      -Mustang  (2001)

      Chevrolet  (USA)
      -Prizm  (2000)
      -Corvette  (2002)

      Honda  (Japan)
      -Civic  (2000)
      -Accord  (2002)

Now consider Listing 21.4, which holds the same data as Listing 21.1, but structured differently.

LISTING 21.4 XML Document with Car-Manufacturer Relationship in a Hierarchy

     <?xml version=″“1.0”" encoding="UTF-8"?>
     <manufacturers>
       <manufacturer id="VW" name="Volkswagen" country="Germany">
         <model name="Golf" manufacturer="VW" year="1999" />
         <model name="Passat" manufacturer="VW" year="2001" />
       </manufacturer>
       <manufacturer id="TY" name="Toyota" country="Japan">
         <model name="Camry" manufacturer="TY" year="1999" />
         <model name="Celica" manufacturer="TY" year="2000" />
       </manufacturer>
       <manufacturer id="FO" name="Ford" country="USA">
         <model name="Focus" manufacturer="FO" year="2000" />
         <model name="Mustang" manufacturer="FO" year="2001" />
       </manufacturer>
       <manufacturer id="CV" name="Chevrolet" country="USA">
         <model name="Prizm" manufacturer="CV" year="2000" /> 

         <model name="Corvette" manufacturer="CV" year="2002" />
       </manufacturer>
       <manufacturer id="HO" name="Honda" country="Japan">
         <model name="Civic" manufacturer="HO" year="2000" />
         <model name="Accord" manufacturer="HO" year="2002" />
       </manufacturer>
     </manufacturers>

ANALYSIS

Instead of using reference values, Listing 21.3 uses the hierarchy of the XML document to define the relationship of the manufacturers and cars. This makes addressing one from the other easy because they have a relationship that can be defined with relative addressing. If you want the manufacturer of the current model element, you can get that element by using the expression parent::manufacturer, which is much more friendly than something with reference values and predicates. Listing 21.5 shows how using this hierarchy changes Listing 21.2.

LISTING 21.5 Stylesheet Creating a List of Manufacturers and Cars from Listing 21.4

      1:  <?xml version=″“1.0”" encoding="UTF-8"?>
      2:  <xsl:stylesheet version=″“1.0”"
      3:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      4:
      5:    <xsl:output method="text" encoding="UTF-8" />
      6:
      7:    <xsl:template match="/">
      8:       <xsl:apply-templates />
      9:    </xsl:template>
      10:
      11:   <xsl:template match="manufacturers">
      12:      <xsl:apply-templates />
      13:   </xsl:template>
      14:
      15:   <xsl:template match="manufacturer">
      16:     <xsl:value-of select="@name" />
      17:     <xsl:value-of select="concat ('  (',@country,')&#xA;')" />
      18:      <xsl:apply-templates />
      19:      <xsl:text>&#xA;</xsl:text>
      20:   </xsl:template>
      21:
      22:    <xsl:template match="model">
      23:     <xsl:value-of select="concat ('-',@name,'  (',@year,')&#xA;')" />
      24:   </xsl:template>
      25: </xsl:stylesheet>

ANALYSIS

Listing 21.5 creates the same output as shown in Listing 21.3 when it is applied to Listing 21.4. The structure of this stylesheet is different from Listing 21.2, however. Line 8 no longer selects the elements that need to be processed because that is not necessary with the hierarchy in Listing 21.4. Instead of using xsl:for-each to iterate through the manufacturer elements, the template on line 11 now uses matching. The template on line 15 matches the manufacturer elements and outputs their values. Instead of a nested xsl:for-each loop, line 18 uses matching again to match the child elements of the manufacturer elements. This approach is much easier than the predicate expression used earlier, and in essence relative addressing is used. The template on line 22 outputs the values for the car on line 23. This stylesheet is desirable over the stylesheet in Listing 21.2 because it is much simpler, and each template is a unit of processing that can easily be replaced if you want to create different output. The templates in Listing 21.5 are independent of each other, whereas Listing 21.2 has one bulky template, which is also harder to understand.

As you can see, the difference between Listing 21.1 and Listing 21.4 has a huge impact on how the data can be processed. Most data selections in Listing 21.4 are much easier to accomplish. Also, getting only the model elements from Listing 21.4 isn’t much harder than getting them from Listing 21.1; using //model or /manufacturers/ manufacturer/model will do the trick.

If you have experience with databases, Listing 21.1 is the obvious choice because you are used to tables with related data. Listing 21.4 is not something you might have come up with because the relationships are hierarchical. XML is, in essence, a hierarchical data format, so hierarchic relationships have preference over other types of relationships. Selecting data is much easier and will probably also perform better, especially with large datasets.

Elements or Attributes?

The debate whether you should use elements or attributes is as old as XML itself. Some people think you shouldn’t use attributes at all, only elements. Their argument is that an attribute is just an element that might occur only once and might have only a text value. You can enforce both these qualities with a DTD or a Schema, so why bother with a different notation, which affects XML, DOM, and XSLT? One answer is, of course, that you can enforce these properties with an attribute without having to define a DTD or Schema. There are, however, other considerations between elements and attributes as well.

Attributes take less space in a document than an element because an element needs to have an opening and a closing tag, whereas an attribute needs only quotation marks and the = character. When a document is large, using an attribute can save quite a large amount of space. In a networking environment where bandwidth is a factor, shaving off 20% of a document’s size might be very important, specifically if the document needs to be sent over the wire many times—for instance, in a Web-based scenario. In such a scenario, you have to pay per gigabyte of data sent, so a document that is 20% smaller means a savings of 20% on cost.

From a design point of view, there are two important differences between elements and attributes. An element can occur more than once as a child element of another element. In table-like data, as shown in the preceding sections, this is very important, but it also holds true in less structured or hierarchical data structures, such as a name element with several firstname child elements. Attributes, on the other hand, can occur only once. Having two attributes with the same name is not allowed. So, in some cases, you can store data only as an element. Also, an element is extensible. An attribute can contain only a number or string value, but an element can be extended with additional child elements. This concept is very important because if you choose an attribute and create all stylesheets accordingly, extending the data structure is a tough job. If you use an element, on the other hand, you can add child elements if the need arises.

In essence, attributes are a good choice when you’re sure that you can have only one of them, and you will never need to extend them. Values, such as unique identifiers, qualify very well for attributes.

Caution

If you choose to use elements, you need to be aware of side effects that occur when you select the value of a node-set or when no matching template exists. In that case, the element value is written to the output, which is probably not what you intended.

One Document or Multiple Documents?

Whether you should use one or multiple documents to store data is not any easy question to answer. There are several considerations:

• Can the data be structured so it can be divided into several documents?

• How large is the entire dataset?

• How often does the data change?

• Is breaking up the data into smaller pieces testing (parts of) the application easier?

• What is the impact of multiple documents on the complexity of the expressions?

If the data is hierarchical in nature, dividing it into several files is not easy. Listing 21.4 is very hard to break up because you need each manufacturer and the related data to be able to process the document properly. Dividing it into several pieces is not possible, apart from creating a different file for each car model for a manufacturer. Dividing the data that way would almost certainly make your application hard to manage, so that approach is not an option, unless the number of car models per manufacturer is large. If, on the other hand, the data is structured as in Listing 21.1, you can easily divide it into two separate files, one with the manufacturers and one with the cars. When you load one or the other into a variable in the stylesheet, you can access the data in the variable by using the reference values in the other document. The downside is, of course, that this will make your expressions more complex because you have to use predicates to get to the right data.

If the dataset is very large, and you don’t always need all the data, breaking up the data into several files is a good idea. The less data the processor has to sift through, the better the performance of your application. In addition, it is probably much easier to test and debug a section of your application with a subset of the data. If you can make sure that all separate sections of your application are correct, you can limit a search for errors in the entire application to the code that is required to use the sections as one large application. This will undoubtedly save you a lot of time.

NEW TERM

Another consideration here is how many users need concurrent access to the data. Unless you’re working with some kind of database, concurrent access to the data is tricky at best, especially if the data needs to change on a regular basis. In that case, smaller files might help because it is more likely that you can open a file to make changes. You need to be aware if you have files with related data, however; if you change one file, for example, the other might not have the referenced data until you change that file as well. With most stylesheets, this is not a problem because, unlike a database, a stylesheets doesn’t enforce referential integrity, which means that data might exist without the reference existing.

One point you need to keep in mind when working with multiple documents is that if you use matching on a secondary document loaded with the document () function, the data in the original source document is not available unless it, too, is stored in a variable. This also goes for keys defined on the source document. The keys you have defined work only while matching the source document for which they were defined, so cross-document keys are not possible. In essence, the more documents you have, the harder it becomes to use data from those documents in concert.

Using Namespaces

In a simple, small application, namespaces are often more trouble than they are worth. If, however, you create an application consisting of different datasets with disjoint vocabularies, possibly consisting of multiple documents, namespaces are an absolute must. Namespaces make sure that you can’t address data that is outside the dataset you work with. Especially when you use more exotic expressions, the chances increase that you’ll select data that comes from a source you didn’t want to get data from. Separating such data with a namespace solves this problem.

Even if you have a document that consists of only elements in the same vocabulary, declaring the namespace is still a good idea. The best way to do so is to declare the namespace as the default namespace so that you don’t have to type the namespace prefix for each element. When you process the document with a stylesheet that might process documents with different namespaces, the prefix in the stylesheet can differ from the prefix in the original document, as long as the namespace name is identical.

You also can use namespaces deliberately to be able to mix data. This way, you can mix data that has a grouping of some sort, so you can address the separate groups as a single group instead of having to spell out each data item. Such mixing of namespaces is shown in Listing 21.6.

LISTING 21.6 XML Document Mixing Namespaces

<?xml version=″“1.0”" encoding="UTF-8"?>
<shop:basket xmlns:shop="http://www.example.com/xmlns/shop"
             xmlns:product="http://www.example.com/xmlns/products">
  <product:product product:ID="234" product:description="Bordeaux"
               product:price="50.00" shop:quantity="1"/>
  <product:product product:ID="123" product:description="Brie"
               product:price="99.95" shop:quantity="3"/>
</shop:basket>

ANALYSIS

Listing 21.6 mixes namespaces to keep apart information from a shop and the inventory. If you want to select only the product information of the first product, you can use the following expression:

/shop:basket/product:product[1]/@product

This expression selects only the attributes in the product namespace. The attributes from the shop namespace are ignored. Now consider Listing 21.7.

LISTING 21.7 XML Document from Listing 21.6 Without Namespaces

<?xml version=″“1.0”" encoding="UTF-8"?>
<basket>
  <product ID="234" description="Bordeaux" price="50.00" quantity="1"/>
  <product ID="123" description="Brie" price="99.95" quantity="3"/>
</basket>

ANALYSIS

Listing 21.7 shows the same information as Listing 21.6, but without the name-spaces. Now no distinction exists between the data that belongs to the product itself or to the shop. If you want to get only the product information of the first product, you have to use either of the following expressions:

/basket/product[1]/@ID | /basket/product[1]/@description |
/basket/product[1]/@price

or

/basket/product[1]/@*[name () != 'quantity']

The latter expression is more versatile because it enables you to add product data, which will still be matched by that expression. The former expression, which is quite lengthy, selects only the specific attributes ID, description, and price.

Design Tools

XML design is similar to database design: Both require application-specific analysis that no tool provides. The reason for this is simple: XML is a flexible data format in which one format isn’t necessarily worse than another. Design tools for applications and databases all work around the premise that an application is structured in a certain way. This is, of course, true, and applications that use XML as a data source also conform to the same structure. The underlying XML structure, however, is the domain of data modeling tools. All current data modeling tools are geared toward database design rather than XML design. XML design therefore is more or less still considered as an art. The preceding sections gave you insight into the tools that you, as the artist, have to work with. With time, experience will teach you how to best use these tools.

XML Schemas also can help you in designing XML structures. XML Schemas define vocabularies and XML structures. The primary concern of a Schema is validation of a document; however, a Schema is self-documenting. By using a stylesheet, you can gather all kinds of information from a Schema. You can download a stylesheet that documents a Schema from http://msdn.microsoft.com/downloads/sample.asp?url=/msdnfiles/027/000/539/msdncompositedoc.xml.

Michael Corning is a pioneer in the field of application design and programming based on XML Schemas. The method of programming he promotes is called Schema-Based Programming (SBP), which he has closely linked to a design framework called the Model-View-Controller framework (MVC). This framework separates the data, the view, and the interaction control into three separate pieces. Corning argues that the whole application builds on the data, or the Model, if you will. Different views of that data, which are controlled by the Controller, enable interaction with the user. This idea is similar to three-tiered architectures in distributed applications, where the data is separated from the logic, which in turn is separated from the display. With MVC, this separation is different, and concentrates around the XML data model. You can find out more details about SBP and MVC at http://www.aspalliance.com/mcorning/. This site also contains downloadable source code for different implementations of an SBP/MVC application.

Designing XSLT

In the preceding section, you learned about the issues involved in designing XML documents. Some of the discussion was related to how easy or hard it is to perform tasks in XSLT. Most of those ideas are equally applicable to solutions using the Document Object Model (DOM) to access XML data. The next step is to look at XSLT itself and examine the considerations for designing a stylesheet. Although this topic is very much linked to the design of XML documents, these considerations apply to most stylesheets, regardless of the XML structure.

XSLT Design Considerations

When you design stylesheets, one of the most important goals you want to accomplish is that you can alter sections of a stylesheet without affecting other sections. Related to that goal is the possibility of reusing sections of stylesheets you create in other stylesheets. When you design stylesheets as part of larger systems, reusing sections becomes even more important because you don’t have to reinvent the wheel each time. It also guarantees that across your application or applications the same data is processed consistently; especially in a Web site where the formatting needs to be consistent for all the pages, this is important. Mechanisms that can help you are variables, attribute-sets, templates, and so on, but when do you use which? The following section answers that question for you. As with XML design, the answers are not universal truths, but just guidelines.

Setting Up a Stylesheet Base

When you start a stylesheet, you start with the foundation. This foundation is the XML prolog, which tells any parser or processor that it is dealing with XML and what the encoding method of the document is. Next is the xsl:stylesheet element, which is also straightforward, unless you use extensions, in which case you have to declare the name-spaces involved. Although you aren’t required to declare namespaces until you actually use them, including any namespace declarations in the xsl:stylesheet element is good practice. This way, you can make sure that all developers can see at once which name-spaces a stylesheet processes. Any elements in the source documents that use a name-space not declared in the stylesheet will not be processed.

Caution

The version attribute should always have the number of the current World Wide Web Consortium (W3C) Recommendation or lower. Using version numbers that haven’t achieved Recommendation status yet is not a good idea. Only when a version has become a Recommendation are all elements and behaviors final.

Other important parts of your stylesheet base are the elements that deal with output encoding, whitespace handling, and so on. For text output, adding these elements is simple because you can specify only the encoding. Specifying the media type doesn’t make sense in most situations. When you’re creating text output, the best thing you can do is strip all nonsignificant whitespace from the source document and insert linefeeds and so on yourself. This way, you have 100% control over what the output will look like. To achieve this, your stylesheet should start with the following code:

<xsl:output method="text" encoding="UTF-8" />
<xsl:strip-space elements="*" />

The preceding code uses UTF-8 encoding, which is probably the most common. Unless you do something really out of the ordinary, or are working on a system that doesn’t support it, keep it that way.

When creating XML or HTML output, you have many more options. Fortunately, these options aren’t all very interesting in most cases. What is important is that you always specify the version you want to create. For XML, specifying the version doesn’t make much sense now, but creating applications isn’t just about here and now; XML is going to be around in the future, and that means versions might change.

The cdata-section-elements attribute of the xsl:output element is important when you have designed a document to contain CDATA sections. When you create a document that has to conform to the design of that XML structure, you need to specify the CDATA section. Although you can specify it later in the game, adding it before you do anything else is good practice.

When you’re setting up a stylesheet base, a smart move is to add a template that matches all elements but does nothing. This template makes sure that data from unmatched elements isn’t sent to the output, so when you use xsl:apply-templates, only elements that are explicitly matched produce output. The base for XML output therefore will look something like Listing 21.8.

LISTING 21.8 Stylesheet Base for XML Output

<?xml version=″“1.0”" encoding="UTF-8"?>
<xsl:stylesheet version=″“1.0”"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" version=″“1.0”" encoding="UTF-8" />

  <xsl:template match="/">
    <xsl:apply-templates />
  </xsl:template>

  <xsl:template match="*" />
</xsl:stylesheet>

Matching, Calling, or Iteration?

Whether you should use matching, calling, or iteration is the most important question when you’re designing stylesheets. Your decisions will have a lasting effect on your application.

Matching is the key concept around which XSLT was designed, and it has some major advantages over iteration. The fact that XSLT was designed for matching doesn’t necessarily mean that matching is faster than iteration, but you can safely assume that matching will be at least as fast as iteration. More importantly, templates are units of coding that can easily be changed. You could argue that the code inside an xsl:for-each element is equally changeable, but that is not quite true because the element is embedded in a template. More important is the fact that an iteration is bound to the template it is used in, so it is sensitive to the context of the template it is used in. A template, on the other hand, can be used independently of context, which means that it can be reused in separate sections of your stylesheet or application. If you want to enforce context on the template, you can do so by using the select attribute of the xsl:apply-templates element and different modes, if necessary.

Does this mean you should never use iteration? No, most certainly not. Every advantage has a disadvantage. In the case of matching templates, the advantage of being generic is also its disadvantage. If you want to do something specific only to the current context, using xsl:for-each is a good choice. Using different types of matching in that case would only complicate matters. Matching the contents of a variable, for instance, when you’re working with multiple documents is tricky. When you iterate through a variable’s content, however, using data that is outside the scope of the variable is much easier. With templates, you would have to use parameters to get around this problem.

If the question is “matching or iteration?” where does calling templates fit in? Called templates have two functions: One is to break functionality of a large template into smaller pieces. If you’re using matching, this probably doesn’t happen much because the templates are already pretty lean. With iteration and calculation, the situation is different, however. The other function of a called template is to solve problems that you can’t solve with matching or iteration. These problems require a template to act as a function of some sort. With variables and parameters, you can make a template return data based on the value of one or more parameters. Although this is also possible when matching templates, you have much less control over the actual result. Recursive solutions, for instance, require called templates with parameters. Don’t grab for recursion too quickly. You should treat it as a last resort if all else fails. In many cases, you also can find a solution using matching or iteration, but it might be less obvious. Recursion puts a strain on the processor that you want to avoid, especially if the source data is large. In that case, hundreds of recursive calls can occur, and some processors might not be able to handle them.

Variables and Attribute-sets

Variables are a fact of life in some applications, particularly those that process multiple documents. Variables are well suited for dynamic data, data that depends on the source document, the current context and scope, and so on. In all these cases, no alternatives exist. When you use matching on the contents of a variable, it is important that you keep in mind that the context changes from the source document to the variable. This means that the data from the source document can’t be accessed, not even through a key. I have stated this point before, but it is so important that it doesn’t hurt to tell you again. I have lost a lot of time in projects because I forgot that I was matching a variable, so my expressions selecting data from the source document drew a blank.

When it comes to static global data, you have an alternative for variables: attribute-sets. Attribute-sets are much more flexible than variables when it comes to adding values as attributes to an element because you can create attribute-sets from other attribute-sets and override the data in some attributes. The downside to attribute-sets is that the names of the attributes have to be known beforehand. When you use a variable to store a data value, you can create differently named attributes from that same value. Whether doing so is a good idea is debatable, however, because it goes against keeping stylesheets and XML documents consistent.

Another drawback of attribute-sets is that they can contain only attributes, so if you want to insert entire elements or element structures, you need to use variables anyway.

Multifile Stylesheets

I don’t have much to say about multifile stylesheets other than what I already stated on Day 13, “Working with Multifile Stylesheets,” except that using them is a good idea if your stylesheets become larger and different stylesheets have templates in common. Think about creating stylesheet libraries that contain templates you use throughout your application. Such a library can save you a great deal of time. By using xsl:apply-imports, you can also create incremental templates. A template that you import is then used from a template matching the same node. The template from the importing stylesheet inherits the functionality of the imported template and can add functionality to it.

Error Handling

Error handling in XSLT is hardly needed. If your stylesheet is syntactically correct, it will work with any source data. The only exception occurs when you use extension functions because they don’t have the same constraints as XSLT functions. Error handling in XSLT is therefore more a concern in the sense that a stylesheet doesn’t have the correct result if the source data is different than expected. For instance, number calculations yield the value NaN if a data value is missing. In many cases, you would actually like such a case to be treated as if the result were zero. This means that you have to check the value before the calculation takes place. The best way to do so is pre-emptively. Before you start to process a piece of data, check whether it conforms to the structure and values that you need. If the calculation needs to take place even if this isn’t the case, create a variable that has the correct structure from scratch or from the data to be processed. That way, you’re sure that the result is always correct. Listing 18.8 used this mechanism. The relevant section of Listing 18.8 is shown in Listing 21.9

LISTING 21.9 Partial Stylesheet Showing Pre-emptive Error Handling

1:  <xsl:variable name="points">
2:    <xsl:choose>
3:      <xsl:when test="$result/points[@team = $id]">
4:        <for><xsl:value-of
5:                  select="$result/points[@team = $id]/@for" /></for>
6:        <against><xsl:value-of
7:                      select="$result/points[@team = $id]/@against" />
8:        </against>
9:        <xsl:value-of select="$result/points[@team = $id]" />
10:     </xsl:when>
11:     <xsl:otherwise>
12:       <for>0</for><against>0</against>0
13:     </xsl:otherwise>
14:   </xsl:choose>
15: </xsl:variable>

ANALYSIS

Line 3 in Listing 21.8 checks whether the needed points element exists. If it does, the values are taken from it; otherwise, the values created on line 12 are all zero. The whole result is stored in a variable that can be used later in calculation. Because the values are all finite number values, you don’t have to worry about a calculation yielding NaN because the points element checked on line 3 doesn’t exist.

XSLT Design Do’s and Don’ts

The previous sections discussed the differences between various options you have when designing stylesheets. When to apply which option is something you need to learn from experience. You will find that the more you work with XSLT, the more options you will have used once or twice. You will most certainly find out what works well, at least for you. I indeed have found that some solutions work better for me than others, and with that, I have developed a few Do’s and Don’ts for myself that might help you.

• Use templates and matching as much as possible. Matching is at the core of XSLT and therefore works best.

• Make your templates as small as possible, without breaking them up into called templates. Smaller templates are easier to maintain, will force you to keep templates simple, and are easier to reuse.

• Use attribute-sets and imported stylesheets to increase the ability to reuse functionality.

• Don’t use multiple source documents unless you absolutely have to. Using multiple documents complicates things, so if you don’t need them, don’t use them.

• Use only local variables and parameters, if possible. Local variables don’t suffer much from scoping problems; global variables do and are therefore not handy in situations in which you reuse functionality.

• Don’t use recursion unless there is no other way to solve a problem. Recursion puts a strain on the processor, and a processor is much more likely to fail than with matching and iteration.

• Always use xsl:output, xsl:strip-space, and xsl:preserve-space to control the output format and whitespace. If you don’t explicitly set the conditions for output, not all processors will produce the same output.

• Never assume that your expression will always yield a value if it has to. XML documents in the real world are likely to be flawed, so you should check that the values you expect are really there. Only if you’re sure that a document is validated by a Schema or DTD can you leave out such checks.

• Always create (and use) a stylesheet base. This will immediately restrict the number of mistakes you can make.

• Don’t assume that your application will always stay the same. Design an application so that it can be extended, either by you or by others.

Summary

In this last lesson, you learned that designing XML and XSLT is mostly about experience. Although there are general rules, satisfying all conditions that result in well-designed XML and XSLT documents is impossible because some of the conditions are in conflict.

The most important aspects of XML and XSLT design are defining the hierarchy of the XML document and using matching in a stylesheet. These aspects go hand in hand because a good hierarchy makes it easier to match nodes. When you need to split up an XML source into several files, you will have to compromise on this hierarchy and use reference values. Doing so will have an adverse effect on the simplicity of your stylesheets. Even so, if you can still solve your problems with matching instead of resorting to iteration, you should try to do so because matching templates ensure that you have a stylesheet that consists of independent units.

Q&A

Q I come from a database background, and I’m used to working with database design tools. Can I use these tools for designing XML?

A Yes. Databases aren’t suited for hierarchical data, however, so you need to be aware of that limitation when you design your documents. If you plan to separate documents similarly to database tables, the tools might be a big help.

Q I’m used to working with an object-oriented design tool. Can I use it to design XML?

A The answer is debatable. XML structures can resemble object-oriented structures, but most often they do not. Using such a tool is like trying to draw a picture while wearing a straightjacket.

Q Is it better to design XML first and then XSLT, or are they linked so much that I need to do them together?

A The answer depends on the situation. However, it never hurts to keep in mind what impact a certain XML design will have on stylesheets you might create for it.

Workshop

This workshop tests whether you understand all the concepts you learned today. It is helpful to know and understand the answers before starting tomorrow’s lesson. You can find the answers to the quiz questions and exercises in Appendix A.

Quiz

1. True or False: Matching is better than iteration.

2. True or False: With a hierarchical relationship, you don’t need reference values to refer to data in other elements.

3. Why would you use attributes instead of elements?

4. What is the biggest problem when you work with multiple source documents?

5. Why would you use a namespace in a document that uses a single vocabulary?

Exercise

1. Create an efficient XML document for the following data:

  Products
  Red wine: Bordeaux
  red wine: Ruby Cabernet
  White wine: Soave
  Red wine: Chianti
  Red wine: Merlot
  Cheese: Camembert
  Cheese: Gouda
  Cheese: Brie
  Cheese: Mozarella
  Cheese: Feta

  Order 1
  Client: John Doe
  Items: 6 Ruby Cabernet wines
         4 Chianti wines
         2 Brie cheeses

  Order 2
  Client: Michiel van Otegem
  Items: 4 Bordeaux wines
         12 Merlot wines
         2 Camembert cheeses
         5 Mozzarella cheeses

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.49.247