Text Nodes

Text nodes represent the character data content within elements. Every adjacent string of characters within element content makes up a single text node. Text nodes can be both queried and constructed in XQuery, although these expressions have limited usefulness.

Text Nodes and the Data Model

A text node does not have any children, and its parent is an element. In Example 21-8, the desc element has three children:

  • A text node whose content is Our (ending with a space)

  • A child element i

  • A text node whose content is shirt! (starting with a space)

The i element itself has one child: a text node whose content is favorite.

Example 21-8. Text nodes in XML (desc.xml)

<desc>Our <i>favorite</i> shirt!</desc>

The string value of a text node is its content, as an instance of xs:string. Its typed value is the same as the string value, except that it is of type xs:untypedAtomic rather than xs:string.

Text nodes do not have names, so calling any of the name-related functions with a text node will result in the empty sequence or a zero-length string, depending on the function.

If your document has no DTD or schema, any whitespace appearing between the tags in your source XML will be translated into text nodes. This is true even if it is just there to indent the document. For example, the following b:header element node:

<b:header>
   <b:date>2006-10-15</b:date>
</b:header>

has three children. The first and third children are text nodes that contain only whitespace, and the second child is, of course, the b:date element node. If a DTD or schema is used, and the element's type allows only child elements (no character data content), then the whitespace will be discarded and b:header will not have text node children.

In the data model, there are never two adjacent text nodes with the same parent; all adjacent text is merged into a single text node. This means that if you construct a new element using:

<example>{1}{2}{3}</example>

the resulting example element will have only one text node child, whose value is 123. There is also no such thing as an empty text node, so the element constructor:

<example>{""}</example>

will result in an element with no children at all.

Querying Text Nodes

Text nodes can be queried using path expressions. The text( ) kind test can be used to specifically ask for text nodes. For example:

doc("desc.xml")//text( )

will return all of the three text nodes in the document, while:

doc("desc.xml")/desc/text( )

will return only the two text nodes that are children of desc.

The node( ) kind test will return text nodes as well as all other node kinds. For example:

doc("desc.xml")/desc/node( )

will return a sequence consisting of the first text node, the i element node, and the second text node. This is in contrast to *, which selects child element nodes only.

Text Nodes and Sequence Types

The text( ) keyword can also be used in sequence types to match text nodes. For example, to display the content of a text node as a string, you could use the function shown in Example 21-9. The use of the text( ) sequence type in the function signature ensures that only text nodes are passed to this function.

Example 21-9. Function that displays text nodes

declare function local:displayTextNodeContent
  ($textNode as text( )) as xs:string {
  concat("Content of the text node is ", $textNode)
};

A text node will also match the node( ) and item( ) sequence types.

Why Work with Text Nodes?

Because text nodes contain all the data content of elements, it may seem that the text( ) kind test would be used frequently and would be covered earlier in this book. However, because of atomization and casting, it is often unnecessary to ask explicitly for the text nodes. For example, the expression:

doc("catalog.xml")//product[name/text( )="Floppy Sun Hat"]

has basically the same effect as:

doc("catalog.xml")//product[name="Floppy Sun Hat"]

because the name element is atomized before being compared to the string Floppy Sun Hat. Likewise, the expression:

distinct-values(doc("catalog.xml")//product/number/text( ))

is very similar to:

distinct-values(doc("catalog.xml")//product/number)

because the function conversion rules call for atomization of the number elements.

One difference is that text nodes, when atomized, result in untyped values, while element nodes will take on the type specified in the schema. Therefore, if your number element is of type xs:integer, the second distinct-values expression above will compare the numbers as integers. The first expression will compare them as untyped values, which, according to the rules of the distinct-values function, means that they are treated like strings.

Warning

Not only is it almost always unnecessary to use the node test text( ), it sometimes yields surprising results. For example, the expression:

doc("catalog.xml")//product[4]/desc/text( )

has a string value of Our shirt! instead of Our favorite shirt! because only the text nodes that are direct children of the desc element are included. If /text( ) is left out of the expression, its string value is Our favorite shirt!.

There are some cases where the text( ) sequence type does come in handy, though. One case is when you are working with mixed content and want to work with each text node specifically. For example, suppose you wanted to modify the product catalog to change all the i elements to em elements (without knowing in advance where i elements appear). You could use the recursive function shown in Example 21-10.

Example 21-10. Testing for text nodes

declare function local:change-i-to-em
  ($node as element()) as node( ) {
  element {node-name($node)} {
    $node/@*,
    for $child in $node/node( )
    return if ($child instance of text( ))
           then $child
           else if ($child instance of element(i))
                then <em>{$child/@*,$child/node( )}</em>
                else if ($child instance of element( ))
                     then local:change-i-to-em($child)
                     else ( )
  }
};

The function checks all the children of an element node. If it encounters a text node, it copies it as is. If it encounters an element child, it recursively calls itself to process that child element's children. When it encounters an i element, it constructs an em and includes the original children of i.

It is important in this case to test for text nodes because the desc element has mixed content; it contains both text nodes and child element nodes. If you throw away the text nodes, it changes the content of the document.

Constructing Text Nodes

You can also construct text nodes, using a text node constructor. The syntax of a text node constructor, shown in Figure 21-4, consists of an expression enclosed by text{ and }. For example, the expression:

text{concat("Sequence number: ", $seq)}

will construct a text node whose content is Sequence number: 1.

Syntax of a text node constructor

Figure 21-4. Syntax of a text node constructor

The value of the expression used in the constructor is atomized (if necessary) and cast to xs:string. Text node constructors have limited usefulness in XQuery because they are created automatically in element constructors using literal text or expressions that return atomic values. For example, the expression:

<example>{concat("Sequence number: ", $seq)}</example>

will automatically create a text node as a child of the example element node. No explicit text node constructor is needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.89.30