XML Entity and Character References

Like XML, the XQuery syntax allows for the escaping of individual characters using two mechanisms: character references and predefined entity references. These escapes can be used in string literals, as well as in the content of direct element and attribute constructors.

Character references are useful for representing characters that are not easily typed on a keyboard. They take two forms:

  • &# plus a sequence of decimal digits representing the character's code point, followed by a semicolon (;).

  • &#x plus a sequence of hexadecimal digits representing the character's code point, followed by a semicolon (;).

For example, a space can be represented as   or  . The number always refers to the Unicode code point; it doesn't depend on the query encoding. Table 21-1 lists a few common XML character references.

Table 21-1. XML character reference examples

Character reference

Meaning

 

Space




Line feed



Carriage return

	

Tab

Predefined entity references are useful for escaping characters that have special meaning in XML syntax. They are listed in Table 21-2.

Table 21-2. Predefined entity references

Entity reference

Meaning

&

Ampersand (&)

<

Less than (<)

&gt;

Greater than (>)

&apos;

Apostrophe/single quote (')

&quot;

Double quote (")

Certain of these characters must be escaped, namely:

  • In literal strings, ampersands, as well as single or double quotes (depending on which was used to surround the literal)

  • In the content of direct element constructors (but not inside curly braces), both ampersands and less-than characters

  • In attribute values of direct element constructors (but not inside curly braces), single or double quotes (depending on which was used to surround the attribute value)

The set of predefined entities does not include certain entities that are predefined for HTML, such as &nbsp; and &eacute;. If these characters are needed as literals in queries, they should be represented using character references. For example, if your query is generating HTML output and you want to generate a nonbreaking space character, which is often written as &nbsp; in HTML, you can represent it in your query as &#xa0;. If you want to be less cryptic, you can use a variable, as in:

declare variable $nbsp := "&#xa0;";
<h1>aaa{$nbsp}bbb</h1>

Example 21-11 shows a query that uses character and entity references in both a literal string and in the content of an element constructor. The first line of the query uses &#65; in place of the letter A in a quoted string. The second line uses various predefined entity references, as well as the character reference #x20;, which represents the space character inside a direct element constructor.

Example 21-11. Query with XML entities

Query
if (doc("catalog.xml")//product[@dept='&#65;CC'])
then <h1>Accessories &amp; Misc&#x20;List from &lt;catalog&gt;</h1>
else ( )
Results
<h1>Accessories &amp; Misc List from &lt;catalog&gt;</h1>

In element constructors, references must appear directly in the literal content, outside of any enclosed expression. For example, the constructor:

<quoted>&apos;{"abc"}&apos;</quoted>

returns the result <quoted>'abc'</quoted>, while the constructor:

 <quoted>{&apos;"abc"&apos;}</quoted>

raises a syntax error, because &apos; is within the curly braces of the enclosed expression.

Including an entity or character reference in a query does not necessarily result in a reference in the query results. As you can see from Example 21-11, the results of the query (when serialized) contain a space character rather than a character reference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.250.11