Space preservation

The XML standard defines the term 'whitespace' to represent the invisible characters used to format text in a text file. These characters include the space character, the tab character, and the line-feed and carriage-return control characters (used to start a new line in the text file).

Sometimes, whitespace characters are included in an XML document only to make the markup easier to read. The following two examples are considered to be equivalent in terms of the information they contain, but the second is more legible:

<chapter><title>The Chapter Title</title><para>A para-
graph.</para></chapter>
<chapter>
  <title>The Chapter Title</title>
  <para>A paragraph.</para>
</chapter>

In both examples, the Chapter element appears to contain two children, namely the Title element and the Para element. However, the way that the second fragment is arranged means that whitespace characters appear between the child elements to create the new lines and indentations:

Note that even if the indents were to be removed, the same number of child nodes would be present, as there would still be a text node for each line-feed:

<chapter>[LF]
<title>The Chapter Title</title>[LF]
<para>A paragraph.</para>[LF]
</chapter>

By default, an XSLT processor does not treat text nodes that contain only whitespace characters differently from other nodes. It would retain all five child nodes of the Chapter element in the example above. The fact that the two XML fragments shown above are considered identical means that it should not matter that the XSLT processor will preserve them.

However, it is possible to instruct the processor to remove text nodes that only contain whitespace characters. The Strip Space element is used to indicate which elements to strip such text from. If the Elements attribute contains just '*', this (as usual) indicates that all elements are to be stripped of insignificant whitespace:

<strip-space
					elements="*" />

This is particularly useful when outputting a text file, rather than a new XML document, as many text file formats treat all whitespace as significant. In such circumstances, it is necessary to remove all whitespace that occurred only to make the source file easier to read.

Even when outputting XML structures, there are occasions when it is useful to remove whitespace, if only to reduce the size of the file.

Stripping whitespace from elements that have mixed content (a mixture of elements and text) can be dangerous. Consider the following example:

<para>Another <emph>great</emph> <name>ACME</name>
product</para>

In this example, there is a whitespace-only text node between the emphasized word and the company name. Stripping this space would cause these terms to be merged on output:

Another greatACME product

Even when the Strip Space elements have been defined as described above, it is possible to make exceptions for such cases. The Preserve Space element also uses an Elements attribute to list elements that are not to be trimmed in this way:

<preserve-space
					elements="para" />

Recall that by default all elements are preserved, and so the following rule is implied:

<preserve-space elements="*" />

When both the Strip Space and Preserve Space elements are used, any conflicts are resolved using the same precedence rules as for template selections. For example, when one element contains '*' and the other contains an element name, the element name is more explicit than '*' and so overrules the general case. The Elements attribute is allowed to contain '*', or a space-separated list of element names (which can be qualified with a namespace, such as 'html:pre html:code').

Stylesheet whitespace

Spaces are often used in stylesheets to make them easier to read, just as they are in other XML documents (as described above). In this case, all insignificant whitespace is removed by the XSLT processor automatically. The two examples below would produce identical output:

<xsl:stylesheet match="para"><P><xsl:apply-templates/></
P></xsl:stylesheet>


<xsl:stylesheet match="para">
<P>
  <xsl:apply-templates/>
</P>
</xsl:stylesheet>

This is why the Text element is so important. It preserves whitespace that would otherwise be removed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.83.8