WEEK 1 Day 1
Getting Started with XSLT

With more and more people using Extensible Markup Language (XML) in their applications, the need arises for a generic language to manipulate XML documents. Extensible Stylesheet Language Transformations (XSLT) is this language. It was developed by the W3 Consortium (W3C) and now has Recommendation status, the closest you can get to a standard on the World Wide Web.

Because XML documents themselves don’t contain any formatting information, you need something else to format and display the data so that it looks pleasant. With XSLT, you have a language to manipulate an XML document. From that document, you then can create another document that contains formatting information, such as Hypertext Markup Language (HTML), Portable Document Format (PDF), or Rich Text Format (RTF). In addition, you can use XSLT to restructure XML documents.

Today you will learn the following:

• What XSLT is

• What the benefits of XSLT are

• How XSLT performs transformations

• Which tools to use to create XSLT documents

• How to use processors to perform transformations

Overview of XSLT

In the past few years, the World Wide Web has exploded in size. You might find that the sheer number of Web sites and pages is incomprehensible. Because these pages contain only formatting information and almost no information regarding their content, finding the information you need becomes harder as the Web grows. Pick a search engine on the Web and enter a topic. Chances are you’ll get thousands of results, most of them irrelevant. In addition to this problem, nearly all information on the Web is encoded in HTML, which was specifically designed to format text for display in browsers. Using this format is fine if you’re viewing pages with a browser, but applications designed to process information have a lot of trouble working with HTML because the data in an HTML document has no meaning, and HTML is too unstructured to easily retrieve the data stored in it.

Enter the Extensible Markup Language, or XML. Although much like HTML, it doesn’t contain formatting information; instead, it contains information about the meaning of data in a document. This effectively means that any document written in XML provides the meaning of its data as part of the document.

Note

Throughout this book, I will use the term XML document for both XML files and XML stored in other forms, such as a string in memory or in a database.

The obvious benefit is that this XML data can be extracted and matched more closely to a search query. The query thus displays the information you actually want and discards everything that is irrelevant. A search engine using XML information can also ask you to refine your query—for instance, to determine whether you meant computer chips or French fries when you entered the search word chips. The idea is that eventually the Web will change from pages of text into the semantic Web, where all pages have meaning, not only to people but also to applications.

Although XML is oriented toward the Web and initiated by the W3C, it is not meant for use on the Web alone. It can be used in all sorts of applications, both for storing data or as a means of communication between applications. This usage might seem a little odd, as there has always been a distinction between storing and communicating data. However, it is actually quite natural: Data is data, no matter where you use it. This concept is gradually gaining ground, as more and more vendors use XML in their applications. Microsoft, for instance, now has XML support in most of its products, in one way or another. In fact, the .NET Framework, which has been developed to run most future applications and services, is more or less built around XML. Other vendors embracing XML include Sun and IBM, using it in various applications.

Introduction to XML and XSLT

An XML document looks a lot like an HTML document. Like HTML, XML uses tags that have bearing on what is inside them, as in this example:

<title>Teach Yourself XSLT in 21 Days</title>

A major advantage is that XML is just text, not some proprietary or binary data format. This means that you can read and edit it with a text editor; an added advantage is that any computer can read it and retrieve data from the document. The latter advantage is made possible by the fact that the tags around the data tell the computer the meaning of that data. The downside is that formatting information is no longer associated with the data, so displaying the data nicely for a human reader is not possible when only XML is used. You might be able to understand and edit XML, but it doesn’t look nice, and it certainly isn’t displayed in a manner appropriate for the purpose you are using it for. Consider this book, for example. If each paragraph, header, and so on were tagged with XML, reading it that way would be much harder than reading it the way it’s formatted now. So, to format the data appropriately, you need to manipulate it before it can be displayed.

What Is XSLT?

If you didn’t have a generic tool or language to manipulate XML, formatting it for display would be very hard. You would have to write your own application to read XML and display it in the way you want. You would have to tell your application how to format each different XML tag. So, what if you wanted to change the formatting? You would have to start all over. To remedy this problem, the W3C started development of the Extensible Stylesheet Language (XSL), which is a generic language to manipulate and display data in an XML document.

NEW TERM

XSL consists of XSL Formatting Objects (XSLFO) and XSL Transformations (better known as XSLT). The former, officially still called XSL, is an XML vocabulary that defines elements used to specify how an XML document is to be displayed. An XML vocabulary is a set of XML tags that have been defined for a certain purpose. XHTML is another example of an XML vocabulary.

NEW TERM

XSLT is a language used to manipulate XML structures or documents. It is also an XML vocabulary. The actual manipulation of an XML document with XSLT is called transformation. Transformation is the process of creating a new document based on the original document. This process does not change the source document.

XSLT is extremely versatile and can be used to convert XML to many other forms. All transformations result in a new tree structure. XSL can even be used to create XSLFO documents, which are useful for creating documents that native applications can act upon. XSLFO documents can be used to create Adobe PDF or Microsoft Word files, for example. The main idea here is that XSLFO is generic and can be used for formatting on different “surfaces,” as it were.

Note

XSLFO has many capabilities for high-quality formatting; this topic is beyond the scope of this book, which covers only XSLT.

What Does XSLT Do?

XSLT transforms an XML document into another document, which can contain XSLFO tags to format the document’s data for display, but this is not required. In fact, you are not required to use XSLT as part of XSL at all. Like XSL is designed for use by many applications, XSLT is designed to transform to many different outputs. So, like XSL, XSLT is a generic language to be used by many applications across many platforms. With XSLT, you can create HTML, XHTML, plain text, PDF, and a number of other document types. You also can use XSLT to transform an XML document into another XML document with a different structure. You may not see the benefit of this capability just now because you can simply use the data from the original document. You will find, however, that transforming into a different XML document is actually very powerful and useful for many applications.

What Does XSLT Look Like?

XSLT is a programming language that transforms XML documents; it, however, is unlike other languages. It differs in look, style, and operation. XSLT is itself XML, and like XSLFO, it is an XML vocabulary. However, its tags don’t tell a program how to display something but rather what to do when it encounters a certain tag. If you have programmed before, some of these tags have familiar names and functions. Other tags will look totally unfamiliar, as you can see in Listing 1.1.

LISTING 1.1 XSLT Sample

<?xml version=″“1.0”" encoding="UTF-8" ?>
<xsl:stylesheet version=″“1.0”"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="UTF-8"
    omit-xml-declaration="yes" indent="yes" />
  <xsl:template match="/">
    <xsl:apply-templates />
  </xsl:template>
  <xsl:template match="books">
    <xsl:for-each select="book">
      <xsl:value select="title" />
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

Note

Understanding the function of Listing 1.1 is not important right now. This sample merely shows what XSLT looks like.

Note

If you use Internet Explorer 5.0 or higher, you can type in the preceding code and view it. You will see that a stylesheet is XML and has a tree structure.

ANALYSIS

Each tag in XSLT expresses a command to the program performing the transformation. In XSLT, unlike most programming languages, these commands may span multiple lines. In fact, because XSLT is itself XML, it has to conform to the same syntax rules as any other XML document. If you have programming experience with Visual Basic, C++, Java, and so on, this coding may look rather strange because there is no concept of lines performing a certain action. XSLT actually works differently from these languages. Don’t worry about this point for now; I’ll discuss it in more detail later.

XSLT and the XML Family

XSLT relies on and interacts with many of the other members of the XML family. Knowledge of XSLT’s place in the XML family and where it came from will help you better understand XSLT itself and how it can be used. Because the XML family has become rather large and is constantly evolving, the following sections will not provide a roadmap to all XML family members.

A Brief History of XML and XSLT

I won’t bore you with a long and detailed history of XML and XSLT. I’ll just give a brief history so that you can put them in perspective. After all, they didn’t just pop up out of thin air. XML is actually based on concepts that were developed in the early days of computing.

The Roots of XML

XML is based on the Standard Generalized Markup Language (SGML), but SGML is much more complex. In that sense, XML is based more on HTML because the design goal for XML was to be as general as SGML but as easy as HTML so that it would be adopted easily. SGML was actually developed as Generalized Markup Language (GML) in 1969 by Ed Mosher, Ray Lorie, and Charles F. Goldfarb at IBM Research. Although the International Standards Organization (ISO) adopted SGML as a data storage and exchange standard in 1986, it was far too complex for widespread recognition. HTML, on the other hand, is the most popular markup language in existence, mostly because of its simplicity.

NEW TERM

XML is actually a simplified subset of SGML and as such is much easier to understand and program with. This ease of use also makes it much easier to create a parser for XML than for SGML, which can account for a lot of XML’s popularity. A parser is a program that can read and understand the syntax and grammar of a language.

Two years after its inception, in February 1998, XML 1.0 became a W3C Recommendation. Since then, work on XML-based languages and systems has taken flight. XHTML, WML, SVG, XPath, XPointer, and XML Query are just some of the XML technologies that have been developed. Some are already W3C Recommendations, whereas others are still under development. XSLT is also one such language, and it became a W3C Recomendation on November 16, 1999.

The Roots of XSLT

Like XML, XSLT is also based on existing concepts. In the early 1990s, an SGML-based standard called Document Style Semantics and Specification Language (DSSSL) was created. This generalized language was meant to be used to manipulate and transform SGML documents to a form that could be displayed or printed. However, because this technology was so complex and hard to use (and thus expensive), only some large publishing houses could afford to create applications for it, for use in high-quality typesetting. SGML and DSSSL approaches are still in use, and as long as no XML/XSLT applications are developed to replace them, they will remain in use.

XSLT has its roots in DSSSL but is much simpler. Although XSLT is based on DSSSL, it is not a real subset of DSSSL, as some XSLT features do not exist in DSSSL.

Other Members of the XML Family

Besides XML and XSL, there are quite a few other members of the XML family—some specific to XML, others shared with technologies such as HTML.

Defining XML Structures

If you want to define an XML structure or vocabulary, you have two options. You can use either a Document Type Definition (DTD) or the more recent XML Schema. The benefit of defining such a structure is that all the documents that conform to the definition will have a predefined structure, making it easier to write applications that use these documents. DTD or XML Schema definitions can be placed in the XML file itself (internal) or in an external file (external). In the latter case, the XML document has to reference the DTD or XML Schema file.

There are two types of parsers:

• Validating parsers

• Nonvalidating parsers

A validating parser raises an error if an XML document does not conform to the rules in the associated DTD or Schema; a nonvalidating parser does not.

Note

Most validating parsers have the option to turn off validation, effectively making them nonvalidating parsers.

A DTD itself is not XML and as such cannot be used by a parser for tasks other than validating an XML document. The DTD cannot be queried as XML. Some parsers offer functions to get information from the DTD, but they are separated from functions that apply to the XML. Also, DTDs provide little information about the data type of values. Only a few data types are known in DTDs, but most applications have many more. You cannot define new data types, so you are stuck with the data types that DTD offers.

Each XML document can have only a single DTD that corresponds to it. Complex DTDs can be created to aggregate other DTDs, but the XML document itself is associated with a single DTD.

To remedy most of the problems with DTDs, the W3C created XML Schemas. XML Schemas, which received Recommendation status on May 2, 2001, are XML and also much more flexible than DTDs. They also define more data types than DTDs do. Many developers were quick to adopt XML Schemas, even before they became a W3C Recommendation.

Telling Apart Vocabularies

By using XML Schemas (or DTDs), you can define XML vocabularies. If you mix vocabularies, however, you may have vocabularies that have elements with the same names. This problem can’t be solved by defining vocabularies alone. XML namespaces have been added to the XML family to solve this problem. XML namespaces are designed to keep XML vocabularies apart. For each vocabulary used in a document, you can define a different namespace with a unique name within the document. This name-space definition can point to an existing XML Schema or DTD, but this is not required. The benefit of pointing the namespace to an actual DTD or XML Schema is that everybody knows what the document is supposed to look like.

You can identify a namespace in an XML document because the namespace appears in front of the element or attribute name. The namespace and element or attribute are separated by a colon. Say you define a namespace for books. A title element using the book namespace would look like this:

<book:title>Teach Yourself XSLT in 21 Days</book:title>

You may remember the XSLT sample in Listing 1.1. In that sample, all elements that were part of XSLT were preceded by the namespace xsl. XSLT itself uses an XML namespace to work. A namespace for a document is defined as an attribute of an element—in most cases, the root element. For XSLT, the namespace is defined as follows:

xmlns:xsl=http://www.w3.org/1999/XSL/Transform

The namespace definition has three parts. The attribute name has two parts: the name-space declaration xmlns (a predefined namespace), followed by the namespace you want to introduce (in this case, xsl).

Note

The introduced namespace does not necessarily have to be the same in every document. The xsl namespace in XSLT is just a convention. If you like, you can create another namespace (for example, transform) and use it instead of xsl.

NEW TERM

The last piece of the declaration is a Uniform Resource Identifier (URI), which must be unique for each vocabulary. A URI is a unique name or address for a resource. It can be a Uniform Resource Locator (URL) or a Uniform Resource Name (URN).

The URI could point to a DTD or XML Schema to define the vocabulary that the name-space represents, but this is not required. Also, if no validation is involved, the name-space can contain any element name. The namespace declaration at the beginning of this section specifically points to a URI that defines it as being XSLT. If you use another namespace for XSLT, the only requirement is that it points to the same URI.

Note

You will learn more details about XML namespaces on Day 15, “Working with Namespaces.”

The Document Object Model

If you want to program with XML, you obviously need a way to get to the data. You could write a program that reads the XML from a file and does something with it. This approach would probably yield a proprietary solution. The whole idea behind XML is that it is a standard anybody can use, so creating a proprietary solution isn’t the way to go. What you need is a standard Application Programming Interface (API) to interact with the XML. The W3C created the Document Object Model (DOM) to serve as a standard API. The idea behind DOM is that an XML document is a hierarchical tree of elements and attributes that can be represented in memory and manipulated through a standard mechanism. DOM is not limited to use with XML; it also works with HTML 4.0. Because HTML is less structured than XML, though, you might run into some problems using it with HTML.

Like the other members of the XML family, DOM is evolving. Since November 13, 2000, DOM Level 2 has Recommendation status. DOM Level 3 is currently under development.

Simple API for XML

DOM is a standard by decree of the W3 Consortium. However, before DOM was around, people were in need of a standard method to use XML. The method that emerged to be the de facto standard, Simple API for XML (SAX), takes an entirely different route than DOM when working with an XML document. Instead of reading the entire document, SAX reads a document an element at a time. Each element that the parser encounters will fire an event. You can attach a routine to the event to act on the element and generate output, if you like.

A big advantage to this approach is that hardly any memory is involved because you don’t need to build the entire document in memory. This approach is extremely useful when you’re working with large documents.

Many parsers use the SAX model. Some others use DOM or, like Microsoft’s MSXML parser, offer a choice between the two.

Addressing Elements and Attributes

Because XML documents have a hierarchical tree structure, you can address elements through a path expression, which is similar to a path expression addressing a file on a file system or in a Web site. With this analogy, you can compare elements of an XML document with folders in a file system and attributes with files.

XPath was developed to address elements and attributes in an XML document. Although XPath is a separate language, it is not used alone; it is always used in conjunction with XSLT or XPointer. The purpose of XPath is to address parts of an XML document. With XPath, you can select a single item or create a path expression that matches several items. This matching capability is very important to XSLT; it is the basis on which elements are selected by XSLT to be transformed. XPointer is the XML equivalent of hyperlinks in HTML. A link in XPointer is defined using XPath.

Note

XPath is an essential part of XSLT. You will learn more details about it on Day 3, “Selecting Data.”

The Benefits of XSLT

The benefits of using XSLT are closely related to the benefits of using XML. Assuming that data is either stored or communicated as XML (which is what XML is for), the benefits of XSLT are as follows:

• Retrieving data from data in an XML document

• Formatting data from an XML document for display

• Translating between an XML document used for communication and a format used within a system

XML and XSLT in Data Storage

When you need to store data, XML is very flexible. You can adjust it to easily fit the type of data it is supposed to store. In that regard, it is much handier than a relational database because a database is limited to related tables. Each table contains rows with a fixed number of columns, with each column having a fixed meaning. The result is that within a database, data is bound to a fixed format. The trouble is that not all data fits nicely in this fixed format. XML has a flexible hierarchical structure that can mimic many, if not all, existing data structures. For example, you can easily create an XML document containing an entire database.

When data is stored in an XML format, XSLT can be used to retrieve data from that document. The fixed format of a database makes it hard to query data from different tables in one operation. Because XML doesn’t have this rigid structure, an XSLT document can easily gather data from different sections in an XML document.

Because XML is so flexible, it can also store relatively unstructured data, such as text documents with some kind of formatting. As I mentioned at the beginning of this lesson, the benefit of detaching the information from the formatting is that you can search based on contextual meaning. With most programming languages, creating formatted output from such a tagged format is hard, whereas XSLT is actually designed to do this. Listing 1.2 shows a sample of such a tagged document that can be formatted with XSLT.

LISTING 1.2 Article Tagged in XML

     <document xmlns:code="http://www.aspnl.com/xmlns/code" xml:lang="en-us">
       <title>Hello world sample</title>
       <text>
         This sample shows how to use <keyword>Response</keyword>.
         <keyword>Write</keyword> to write text to a <device>browser</device>
       </text>
       <code:block multiline="yes" type="lesson" subject="ASP"
         name="write" language="ASP">
         <code:html>
          &lt;html&gt;
          <tab />&lt;body&gt;
         </code:html>
         <code:asp>
          <code:keyword>Option</code:keyword>
          <code:keyword>Explicit</code:keyword><br />
          <br />
          <code:comment>'declare variable (s)</code:comment>
          <code:keyword>Dim</code:keyword> strWrite<br />
          strWrite = "Hello World!"<br />
          <code:object>Response</code:object>.
          <code:method>Write</code:method> strWrite
         </code:asp>
         <code:html>
          <tab />&lt;/body&gt;
          &lt;/html&gt;
         </code:html>
       </code:block>
     </document>

ANALYSIS

The XML in Listing 1.2 is part of a document used to create HTML for a Web site with color-coded code samples. You may be wondering why I didn’t create the HTML directly instead of creating it from XML because creating the HTML from XML requires an extra step. The answer is that XML is not only used to create HTML files, but also the index file and code samples that people can run to see the result. Also, the same XML can be used to create a printed manual that could be used in a class. The separate files are created using XSLT. Each output type is created using a different XSLT document, and each XSLT document is a template for its output type. Any document that uses the same XML tags can be transformed into that output type using the same XSLT document, as depicted in Figure 1.1. Although in the beginning you need to do some extra work creating the XML and XSLT, in the end you will save a lot of time because you can reuse the XSLT to create all the files you need. This makes XML and XSLT very useful in document management and Web site management scenarios. The latter case is especially true for Web sites that target multiple platforms, such as handheld devices, in addition to regular browsers.

FIGURE 1.1 XML transformation to multiple outputs.

Image

Besides targetting multiple platforms using one XML document and several XSLT documents, you can also create a unified look and feel for multiple XML documents. That means you have to create only one XSLT document that covers the look and feel of an entire site. An additional benefit here is that you can do this for multiple languages. So, creating a Web site that can be viewed in multiple languages is much easier than with other approaches. After all, the data storage is the same and so is the XSLT document.

An additional benefit of storing data in an XML document is that it can be queried using XPath. This procedure works somewhat like selecting data from a database using Structured Query Language (SQL). The XML document therefore is used somewhat like a database itself.

A problem with this usage occurs when you start working with multiple users. If a person editing a document locks it, someone else cannot read from and query it. This makes XML unsuitable for use in a multiuser environment. Databases (and database servers in particular) are designed for use in a multiuser environment but can’t handle XML very well.

You can dump an XML document into a database as a text field, but then you would need to extract it from the database before you could query it. For this reason, XML support is making its way into database technology more and more. Some databases enable you to extract the relational data stored in it as XML. Although this capability is a start, it is not as good as a database that allows native support for XML storage and makes the entire XML queryable with XPath. These types of databases are in existence, though, and slowly getting better. They do not yet come close to the speed of relational database systems, but they are improving fast. With these databases, many existing types of applications could be created more easily, and possibly you can create new applications that are as yet beyond your reach. You can find more details about XML databases at

http://www.rpbourret.com/xml/XMLDatabaseProds.htm

XML and XSLT in Communication

NEW TERM

Because XML is a nonproprietary data format that is based on one of the most basic data types, the string, it can be read by almost any computer. This makes it the ultimate data format for communication between systems. Any system equipped with an XML parser can consume and use XML. Web Services are based on this concept. A Web Service is a function provided by one system and usable by another system across the Internet.

A Web Service differs from a Web page in that a Web page is meant for display, and the result of a Web Service in most cases is not direct. The results of a Web Service may not see the light of day for a long time after it has been used, but it is equally possible that the result is used to display a composite result right away.

Web Services can be implemented with a number of technologies. Two of the most-used technologies are XML Remote Procedure Calling (XML-RPC) and the Simple Object Access Protocol (SOAP). Both define a protocol (or message format) allowing a system to use a function on another system, as shown in Figure 1.2. Although the two technologies are the same in nature, they use two different XML formats. As long as an application using the service knows which protocol it is dealing with, it can retrieve data from that format. If the application needs to manipulate the data, it either must rely on the XML DOM or use XSLT.

FIGURE 1.2 An application making a function call across the Internet.

Image

The type of communication that XML-RPC and SOAP implement isn’t new. A few existing technologies have the same function. These technologies, including CORBA/IIOP, Microsoft COM/DCOM, and Java Remote Method Invocation, are all pretty much system specific, however. These technologies also do not use the Hypertext Transfer Protocol (HTTP) for communication between systems and therefore do not pass through a regular firewall protecting networks. These methods, work only if the firewall is specifically configured to allow other protocols. XML-based communication systems are not system specific, because any system can deal with string data. With XML-RPC and SOAP, the communication can take place through HTTP, so it will work fine, even if a firewall is in place. XML-RPC and SOAP are not restricted to HTTP; however, e-mail or some other means of communicating the messages are also possible.

Some older protocols, such as Electronic Data Interchange (EDI), are adopting XML for communication as well. Chat applications that use XML are already available. Someday you may see most communication protocols replaced by a single, XML-based protocol.

XML can also be used to communicate within an application. A major benefit to this approach is that every component in the system can access data passed within the system in the same manner, no matter what kind of component it is. Data from different kinds of data sources also can be represented in the same way. Therefore, no matter if the data comes from a database, mail server, or another source, the data representation is the same. XSLT can be used to iron out the differences and transform the data into a format suited for all the sources. The benefit is that applications are not confronted with interface differences of the underlying system. This also holds true for functions interacting through XML.

When Not to Use XSLT

Although XSLT is a powerful tool in modern data-driven applications, it is by no means the answer to all your problems. In some situations, other approaches make much more sense. Providing a complete list of situations is not possible, but I will discuss some of the more common applications in the following sections. This information will give you an idea of the situations you should avoid, or at least think twice about. You will need to use this information as well as your experience to judge whether XSLT is the right solution for a problem you’re trying to tackle.

XSLT Performance Problems

XSLT is extremely useful in applications in which data conversion is key, such as document management and publishing applications. Because of the performance, XSLT is less useful in applications that require a lot of processing. The performance aspects of XML and XSLT will change, however, as applications become more and more centered around them. Applications that don’t suffer much from these performance problems have a distributed nature, in which the transformation can be performed on the client. With the key browser manufacturers including XML and XSLT support in their browsers, Web applications that use XML and XSLT can utilize the power of the client and possibly reduce network traffic. This approach is far more appealing than a server having to do all the transformation to a format that can be read by the client, as the transformation process in such an application could become a major bottleneck.

A specific scenario that suffers from performance problems is Web sites using XML and XSLT. As yet, the major browsers do not properly support XSLT. Performing transformations in the browser, distributing the load, is therefore not possible yet. Transforming all the XML at runtime in a busy Web site is far too slow to be a viable solution. Preliminary benchmarks with the XSLT processor in Microsoft’s .NET Framework suggest, however, that this situation may change sooner rather than later.

Data Warehousing Applications

Currently, both performance and concurrency issues make XML and XSLT less suitable for data warehousing applications. Although native XML databases exist, they can’t compete with the top relational databases that have been around for a while. If you can structure your data to be stored in a relational database, however, there is the possibility of extracting the data as XML if you use some of the current relational databases. SQL Server 7 and Oracle provide add-ons that enable you to return a query result as XML. SQL Server 2000 provides this capability by default and can be configured to do so over the Web.

Computational Applications

XSLT also fails to deliver the goods in highly computational applications. Although XSLT certainly provides computational capability, most programming languages, such as C and FORTRAN to name a few, offer far more functions for complex computations and with much better performance.

Using CSS Instead of XSLT

In scenarios in which your XML documents are targeted at the Web, XSLT might be overkill. If the data doesn’t need to be filtered and is in the right order, Cascading Stylesheets (CSS) do the job very nicely. CSS is by no means restricted to use with HTML. In fact, you can “invent” tags in HTML and attach a style to them. Because for all intents and purposes XML tags can be seen as HTML tags that you invented yourself, attaching a style is as easy as it is in HTML. In this scenario, using XSLT is actually counterproductive. Using CSS is a quick solution that requires no knowledge of XSLT and can probably be handled by most Web designers.

How Does XSLT Work?

With a general view of XSLT under your belt, it’s time to move on to the actual workings of the language. Before you actually start to work with XSLT, however, you must understand some of the basics about how it works.

NEW TERM

The transformation process of an XML document is performed by a processor, which is an application (or software component) that reads an XML document and an XSLT document and applies the XSLT to the XML. Processors exist both as command-line–runnable applications and as software components that can be used in an application. In the next section, I will discuss some of the more common processors and how they are used.

A processor consumes XML and, as such, is built on an XML parser. This parser can load the XML and XSLT documents using DOM and then apply the XSLT to the XML. Another option is a processor based on SAX.

Note

My discussion of XSLT transformation at this point is purely theoretical and loosely based on the DOM approach. Actual processors most likely do not follow this exact process.

XSLT Transformation Explained

To understand the actual transformation process, examine a sample transformation. Listings 1.3 and 1.4 show a sample XML document and the XSLT document to be applied to it.

LISTING 1.3 Sample XML Document with Pets

<?xml version=″“1.0”" encoding="UTF-8" ?>
<pets>
  <pet type="cat">Max</pet>
  <pet type="parrot" color="red">Peter</pet>
</pets>

ANALYSIS

Listing 1.3 is a simple XML document representing my pets (actually, I don’t have a parrot). The pet names appear between the <pet> tags, which have a type attribute denoting the pet type (in this case, a cat and parrot). For the parrot, I also defined a color attribute. Figure 1.3 shows a tree representation of Listing 1.3.

FIGURE 1.3 Node tree representing Listing 1.3.

Image

NEW TERM

In Figure 1.3, the circles represent elements (tags) in the XML source. The diamond shapes represent attributes (of a tag). The rectangles represent values of either the element or attribute they are associated with. Within this tree, the circles and diamonds are known as nodes. When this tree is transformed using XSLT, the processor starts with the root node and “walks the tree” in the direction shown by the arrows. When the processor encounters a node, it searches for a rule in the XSLT document, matching the name and location within the tree of that particular node. If it finds such a rule, that rule is then applied to that node. This means that the execution of XSLT doesn’t have a step-by-step sequence that is common to languages such as C, FORTRAN, and COBOL. These languages are procedural in nature, following a predetermined sequence of commands. Object-oriented languages, such as C++, Java, and Visual Basic, are based on the same model, except that the sequences are part of operations on objects. The one extra feature that object-oriented languages add to this is event-driven execution, which means that code is executed when some event happens—for instance, when you click a button.

NEW TERM

The execution of XSLT is similar to event-driven execution. In this type of execution, an event determines the sequence in which code is executed. In XSLT, this sequence is determined by the data that is encountered. This is why XSLT is based on data-driven execution, which means that code is executed when a certain piece of data is encountered.

Listing 1.4 contains a simple XSLT document that you can use to transform the XML in Listing 1.3. It consists of four rules that determine whether the code should be executed.

LISTING 1.4 Sample XSLT Document for Pets

<?xml version=″“1.0”" encoding="UTF-8" ?>
<xsl:stylesheet version=″“1.0”"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="UTF-8"
    omit-xml-declaration="yes" />

  <xsl:template match="/">
    <xsl:apply-templates />
  </xsl:template>

  <xsl:template match="pets">
    <xsl:apply-templates />
  </xsl:template>

  <xsl:template match="pet">
    My <xsl:value-of select="@type" /> is called <xsl:value-of select="." />.
    <xsl:apply-templates select="@color" />
  </xsl:template>
    <xsl:template match="@color">
  <xsl:value-of select=".." /> is <xsl:value-of select="." />.
</xsl:template>

</xsl:stylesheet>

The output for Listing 1.4 looks like Listing 1.5.

OUTPUT

LISTING 1.5 Output When Applying Listing 1.4 to Listing 1.3

My cat is called Max.

My parrot is called Peter.
 Peter is red.

Note

Unless otherwise stated, all output is the result of using (Instant) Saxon version 6.2.2. This program and several others will be discussed later in this lesson.

Note

The whitespace in between the lines of text in Listing 1.5 is supposed to be there. It appears based on the way XSLT handles whitespace by default. On Day 7, “Controlling the Output,” you will learn how to remove the whitespace.

ANALYSIS

The first rule <xsl:template match=”/”> is applied when the processing starts, as it matches the root of the document. Within this rule, the command<xsl:apply-templates /> tells the processor to go on to any child node or nodes. In this sample, the child node is pets, which tells the processor to do the same again. This brings the processor up to the first pet node, firing the appropriate template, the first of which actually generates output. The second child node is processed by the same rule, which goes on to fire the color attribute rule.

As you can see, the output is different for the first pet and the second because the first doesn’t have a color attribute. This is what XSLT is all about. If there is no node (element or attribute), nothing happens. But if there is, the rule is applied. If you have programming experience, you might not think this is much different from a program that reads input and acts on it. Such a program, however, contains explicit commands explaining what to do with certain input. XSLT works the other way around: The commands are there, but they’re used only when applicable.

Programming event-driven code can be tricky because you don’t know in which order events may occur; thus, you don’t know in which sequence your code will be executed. On the other hand, the event-driven model more closely resembles what happens in most applications. Writing data-driven code can be even harder because code is executed in a nondeterministic way, depending on whether certain data exists and the data’s location within the document tree. If the location of some piece of data is different, so is the order of execution. The result, however, need not necessarily be different.

Understanding Declarative Programming

NEW TERM

So far, you’ve seen that XSLT is a rule-based programming language. Another major difference between XSLT and some of the more common programming languages is that XSLT also is a declarative programming language. With a declarative programming language, you tell the computer what to do rather than how to do something.

Declarative programming languages abstract the steps the computer has to take from what you want the computer to do. Instead of programming the steps the computer has to take, you specify what you want to happen. Arguably the most-used declarative language is Structured Query Language (SQL). SQL specifies which data you want to get from a database. It doesn’t tell the database how to get the data. That job is left up to the query processor. In a similar fashion, XSLT leaves the “how” up to the processor. This actually accounts for the different parser/processor models. The difference between the “how” with DOM- and SAX-based processors is huge, yet XSLT works equally well on both types without any modification. Similarly, database servers can implement the storage of data completely different from one another. As long as they can understand SQL, the result is the same.

Like rule-based programming, declarative programming takes some getting used to. With nondeclarative languages, you have tight control over what happens because you specify every step of the process. Because you do not specify every step in a declarative language, you have less control over the result (or that’s what you will think anyway). The advantage of declarative programming is that, in general, the programming process will go faster. When you want a slightly different result from the general case, programming will take more effort because you will have to work with a different sort of toolset. Whereas in traditional languages you could just change a processing step, here you are stuck with what the processor comes up with. You have to find a way to use the tools the language provides.

You can compare declarative programming with coaching a football team. As the coach, you don’t compete in an actual game. You just tell the team members what you want them to do. How they do it depends on what happens in the game. The difference between a computer and a football team is that the computer will do exactly what you tell it to do, whereas a team may not. However, even if the computer does exactly what you tell it to do, the result may not be what you expected. Remember Listings 1.3 and1.4. There, I just wanted to get two lines of text; instead, I got more lines and some whitespace I didn’t expect. To get the whitespace out of the way, I obviously have to specify more closely what I want.

Creating XSLT Files

You have to create XML and XSLT documents before you can do anything. To create both, you have many options; the most common are discussed in the following sections. Most are Windows applications, but some are available for other platforms or are Java based and will run on any platform running the Java Runtime Environment.

Using a Text Editor

You can create XML with a number of tools. Because it is just text, you can use any text editor, such as Notepad, Textpad, or UltraEdit. The advantage of using a text editor is clearly that you can quickly make changes because these programs are lightweight and load fast. Also, text is easily editable. However, using a text editor is not the best approach to writing XSLT (or XML). A major disadvantage is that XML and XSLT have to be structured properly, and the commands need to be accurate. When you use a text editor you can easily make typing errors, and forgetting a closing tag is also common.

Using an XML Editor

XML editors come in different forms. The most basic ones offer color-coded display and possibly syntax checking. These editors are often part of a development environment used to create Web sites. A good example of such an editor is Allaire Homesite. Starting with version 4.5.2, it has standard XML tag support, and for earlier versions, extensions are available.

Other types of editors offer an interface in which you can easily edit nodes within a document. These editors represent the XML document as a tree view that you can manipulate. The most basic editor of this type is XML Notepad, which you can freely download from http://msdn.microsoft.com/xml/. XML Pro version 2.01 from Vervet Logic, (http://www.vervet.com) also supports this type of interface and has DTD support.

The third type of editor combines source code editing and node tree editing. This type of editor gives you the advantage of less error-prone tree manipulation but at the same time offers you more control over the actual source code. When you get into more specific output, this control is very important. At the time of this writing, one of the best editors is XML Spy (http://www.xmlspy.com), which offers many options. With the useful auto-complete option, you can easily pick XSLT elements and attributes. Because auto-complete is context-sensitive, it shows you only the elements or attributes that are applicable at your location in a document. XML Spy is pluggable, so you can configure it with any parser you want.

The problem with most regular XML editors is that they have no embedded XSLT support. This means that they cannot validate your XSLT tags by default. If they support validation against a Schema or DTD, you can implement this validation yourself. Validating your XSLT documents is paramount if you want to be able to quickly create XSLT documents. Validation cuts down on mistakes such as wrong XSLT tags or attributes that aren’t supported by a specific tag.

XSLT Editors and Debuggers

Some XSLT editors and debuggers are currently available. These applications offer much more than an interface that allows quick creation of XSLT documents. They offer debugging options; for example, they allow you to perform a transformation step by step, with each step showing you which rule is fired. The major advantage of this capability is that you can see what is happening and possibly where you’re going wrong.

eXcelon Stylus Studio

eXcelon Stylus Studio (http://www.stylusstudio.com) enables you to write XSLT completely by hand, aided by an auto-complete tool that shows the available XSLT elements and attributes. Another option is to write the XSLT document only partially by hand, based on an existing XML file. From the tree representation of the XML file, you can also create rules that should be applied to that element. With the built-in parser, you can then step through the transformation process. Other processors can be plugged in, but stepping is not supported in that case.

Marrowsoft Xselerator

Marrowsoft Xselerator (http://www.marrowsoft.com) is not quite as feature rich as Stylus Studio, but it does offer a context-sensitive auto-complete. It offers only those XSLT elements that should be available in the part of the document you’re working in. Like Stylus Studio, it provides stepping and pluggable processors. Xselerator is very easy to use and gives you quick results.

XSL Debugger

An intriguing alternative to the commercial products described in the preceding sections is XSL Debugger, developed by TopXML.com (http://www.topxml.com), a community Web site on XML. Although limited as an editor, it provides solid debugging, which makes it useful as an extra development tool if you’re already using an editor such as XML Spy.

Visual Studio.NET

Visual Studio.NET itself does not provide any XSLT debugging capabilities. However, Visual Studio.NET is highly pluggable, giving third-party vendors the opportunity to create something. Active State (http://www.activestate.com) has developed Visual XSLT as a plug-in to Visual Studio.NET. This plug-in provides XSL Debugger–type debugging and tag validation.

XSLT Design Tools

The last type of editor available goes at XSLT creation from the opposite direction. By using an existing XML document, you can create XSLT documents in a WYSIWYG environment. Using this editor is more or less like working with a WYSIWYG HTML editor, but with some added functions to work with XML documents. Whitehall <xsl> Composer (http://www.whitehall.com) is the only such product available on the market. <xsl> Composer is impressive because you don’t need any knowledge of XSLT to use it. However, as with all such environments, getting exactly what you want is very hard.

Processors for XML Transformation with XSLT

Many processors are currently available, and still more are under development. There is no need to discuss them all, so the discussion here is limited to some of the more popular processors: MSXML, Saxon, and Xalan. Although these processors are free, you should read the license agreement before using them. You can find a list of other available parsers at http://www.w3.org/Style/XSL/. Another processor that is bound to be popular is the .NET Framework XSLT processor. It’s not discussed here because it cannot be invoked from the command-line yet. I mention it, however, because preliminary tests show that this processor outperforms all the existing processors by a considerable margin.

MSXML

MSXML is the XML parser/processor available from Microsoft. The first version was shipped along with Internet Explorer 5.0. Because it was shipped before the XSLT specification was final, this version is not fully compliant. Versions 2.0 and 2.6 are a step in the right direction but are still lacking in a few areas. MSXML 3.0 and higher are good and used often by Visual Basic and Active Server Pages (ASP) developers. The latest version is available from http://msdn.microsoft.com/xml/.

MSXML is a component, so it cannot be run as a separate application. If you want to use it, you have to write an application. Having to go to all this trouble sounds pretty bad, but in 9 times out of 10, XML and XSLT will be used in a custom application anyway. However, Microsoft has provided a command-line executable called MSXSL, which you also can download from http://msdn.microsoft.com/xml.

Installing MSXML and MSXSL

MSXML comes in a Windows installation package. To install it, run the package and follow the installation steps. Because it has no options, your installation can’t go wrong.

Note

MSXML does not come with any documentation. The documentation is part of the MSXML Software Developers Kit (SDK), which you can download from http://msdn.microsoft.com/xml.

MSXSL comes in a ZIP file. Apart from unpacking it, you don’t have to perform an installation. You can run it from the command-line, but be sure that it is either in the same directory or that a path is defined to the directory holding the executable. The easiest way to ensure this is to place MSXSL.exe in the System or System32 directory of Windows. The ZIP file also contains a Word document discussing all the command-line options.

Running MSXSL

To run MSXSL, follow these steps:

1. Open the MS-DOS command prompt.

2. Change to the directory containing your XML and XSL files. You also can specify the full path to the files when you invoke MSXSL.

3. At the command prompt, type the following:

msxsl source.xml stylesheet.xsl

If the syntax of the documents is correct, the output is displayed.

As you can see, transformation from the command prompt is fairly easy. MSXSL command-line options will be discussed in Appendix C, “Command-Line Options for Common Parsers.”

Saxon

Saxon is a Java-based XSLT processor developed by Michael Kay. It comes with a SAX parser but will work with other SAX parsers as well. Because it runs on Java, it will work on any system that has the Java Runtime Environment installed. For Windows users, an executable that can be run from the command prompt also is available. For programmers, Saxon offers an API that can be used with Java.

Installing Saxon

You can download Saxon from http://users.iclway.co.uk/mhkay/saxon/. There, you can choose from two versions: the full version and Instant Saxon. You need to run the full version under Java; you can run Instant Saxon from the Windows command prompt. Both require that the Java Runtime Environment version 1.1 or higher is installed.

After you unpack Instant Saxon, place it in a directory you want. Then run Instant Saxon from the command-line, either in the same directory as the executable, or from another directory if a path to the executable has been defined.

If you install the full version, make sure that the Java classpath environment variable has a reference to saxon.jar.

Running Saxon

To run Saxon, follow these steps:

1. Depending on your operating system, open the command prompt, a command window, or the shell.

2. If you’re using Instant Saxon, type

saxon source.xml stylesheet.xsl

If you’re using Saxon with Java, type

java com.icl.saxon.StyleSheet source.xml stylesheet.xsl

Providing the input is correct, the output should be displayed.

Xalan

Xalan is a processor developed by the Apache XML Project (http://xml.apache.org). The first version, Xalan-C++, is no longer available and has been replaced by Xalan-Java. You can download it from http://xml.apache.org/xalan-j/index.html.

Xalan runs on top of the Xerces-Java parser. It is pluggable, so it will run with other parsers as well. Like Saxon, Xalan offers an API so that you can use Xalan within Java applications.

Installing Xalan-Java

The Xalan-Java parser comes in a ZIP or GNU-ZIP package, which can be extracted to a directory you want. You then need to add a reference in the Java classpath environment variable that points to xalan.jar in the extracted package’s bin directory.

Running Xalan-Java

To run Xalan-Java, follow these steps:

1. Depending on your operating system, open the command prompt, a command window, or the shell.

2. From the command prompt, run Xalan by typing

java org.apache.xalan.xslt.Process -in source.xml -xsl stylesheet.xsl

Summary

Today you learned that XSLT is a language used for manipulating and transforming XML documents. XSLT, which is itself XML, offers a vocabulary of commands that performs certain functions on an XML document. XSLT was developed as part of XSL but can be used separately. XSLT incorporates XPath to select and filter elements and attributes within an XML document.

XML and XSLT have their roots in SGML and DSSSL but are much simpler. Other technologies have been added to the XML family, so it is still growing. Some of these technologies, such as XML namespaces, have great bearing on XSLT.

XSLT is useful in document management scenarios in which you need multiple outputs of the same document. These target documents can be a range of types, such as plain text, HTML, XML, and PDF. When the Web is the only target, CSS may be a viable alternative. Because of concurrency and performance issues, XML is less useful for high-end data storage and for scenarios in which server-side transformation is required.

Many processors are available for XSLT, based on different parsers and different types of parsers (that is, DOM and SAX). The Java-based parsers can be used in Java applications and from the command-line. Microsoft also offers a parser/processor and an add-on to run it from the command-line.

Tomorrow you will learn the basics of XSLT and start working on your first transformation.

Q&A

Q Will XSLT replace CSS?

A Probably not. XSLT is more complex than CSS, so for simple documents, CSS is a good solution that more people know how to use. CSS also can create some effects that XSLT can’t. XSLT and CSS can be used together to create richly formatted documents. A problem with CSS is that it operates only on element data. Data stored in attributes can’t be displayed with CSS; with XSLT, it can.

Q It seems XML/XSLT has some drawbacks that must be overcome. Why should

I start learning XSLT now?

A XML/XSLT is one of the fastest growing fields of technology at the moment, with most major corporations backing it. Many of the problems are known and are being addressed. They will be taken care of sooner rather than later. A good example is the XSLT debuggers. Until fairly recently, no XSLT debuggers were available, and not many people had an idea how to create one because XSLT works differently than languages like Visual Basic. Now several very good products are available.

When you start working with XSLT, you also will find that it can easily solve problems that now take you a lot of effort. XSLT, in that sense, is just another tool in your toolbox.

Areas in which XSLT will be of much benefit are applications targeting multiple platforms, applications in multiple languages, translation between different data storage formats, and applications working with relatively unstructured data that needs to be queried.

Workshop

This workshop tests whether you understand all the concepts you learned today. It is helpful to know and understand the answers before starting tomorrow’s lesson. You can find the answers to the quiz questions and exercises in Appendix A.

Quiz

1. True or False: XSLT can transform XML to XML, HTML, and different text-based file formats.

2. True or False: You have to use xsl as a namespace for XSLT.

3. What do you need to run XSLT?

4. XSLT is based on a data-driven programming model. What is meant by this?

5. XSLT is a declarative programming language. What is the difference between declarative languages and languages such as C, Java, and Visual Basic?

6. What kinds of tools can you use to create XSLT documents?

Exercise

1. Create an XML document with the following code:

<?xml version=″“1.0”" encoding="UTF-8" ?>
<pets> 
    <pet type "cat">Max</pet>
    <pet type "parrot" color "red">Peter</pet>
</pets>

Now create an XSLT document with following code:

<?xml version=″“1.0”" encoding="UTF-8" ?>
<xsl:stylesheet version=″“1.0”"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="UTF-8"
       omit-xml-declaration="yes" />

    <xsl:template match="/">
       <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="pets">
       <xsl:apply-templates />
    </xsl:template>

    <xsl:template match="pet">
       My <xsl:value-of select="@type" /> is called
  <xsl:value-of select="." />.
<xsl:apply-templates select="@color" />
    </xsl:template>

    <xsl:template match="@color">
       <xsl:value-of select=".." /> is <xsl:value-of select="." />.
    </xsl:template>

</xsl:stylesheet>


<xsl:template match=”/”>

Execute the files using any of the processors discussed today. If you use two different processors, will the output be the same?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.96.94