Chapter 21. Working with Qualified Names, URIs, and IDs

This chapter describes the functions and constructors that act on namespace-qualified names, Uniform Resource Identifiers (URIs), and IDs. Each of these types has unique properties and complexities that sets it apart from simple strings.

Working with Qualified Names

The type xs:QName is used to represent qualified names in XQuery. An xs:QName value has three parts: a namespace, a local part, and an associated prefix. The namespace and the prefix are optional. If a QName does not have a namespace associated with it, it is considered to be in “no namespace.”

A prefix may be used to represent a namespace in a qualified name, for example, in an XML document. The prefix is bound to a namespace by using a namespace declaration. The prefix itself has no meaning; it is just a placeholder. Two QNames that have the same local part and namespace are equivalent, regardless of prefix. However, the XQuery processor does keep track of a QName’s prefix. This simplifies certain processes like serializing QNames and casting them to strings.

Most query writers who are working with qualified names are working with the names of elements and attributes. (It is also possible for a qualified name to appear as element content or as an attribute value, but this is less common.) You may want to retrieve all or part of a name if, for example, you want to test to see if it is a particular value, or you want to include the name in the query results. You may want to construct a qualified name for a node if you are dynamically creating the name of a node, using a computed element constructor. These two use cases are discussed in this section.

Retrieving Node Names

Four functions retrieve node names or parts of node names: node-name, name, local-name, and namespace-uri from element and attribute nodes. They are summarized in Table 21-1.

Table 21-1. Functions that return node names
Function nameDescription
node-nameThe qualified name of the node as an xs:QName
nameThe qualified name of the node as an xs:string that may be prefixed
local-nameThe local part of the node name as an xs:string
namespace-uriThe namespace part of a node name (a full namespace name, not a prefix) as an xs:anyURI

Each of these functions takes as an argument a single (optional) node. Table 21-2 shows examples of all four functions. They use the input document names.xml, shown in Example 21-1.

Example 21-1. Namespaces in XML (names.xml)
<noNamespace>
  <pre:prefixed xmlns="http://datypic.com/unpre"
                xmlns:pre="http://datypic.com/pre">
    <unprefixed pre:prefAttr="a" noNSAttr="b">123</unprefixed>
  </pre:prefixed>
</noNamespace>

Note that the original prefixes from the input document (or lack thereof) are taken into account when retrieving the names. For example, calling the name function with the unprefixed element results in the unprefixed string unprefixed. This does not mean that the unprefixed element is not in a namespace; it is in the http://datypic.com/unpre namespace. It simply indicates that the unprefixed element was not prefixed in the input document, because its namespace was the default, and therefore had no prefix as part of its QName. Therefore, if you are testing the name or manipulating it in some way, it is best to use node-name rather than name, because node-name provides a result that includes the namespace.

Table 21-2. Examples of the name functions
Nodenode-name returns an xs:QName with:name returnslocal-name returnsnamespace-uri returns
noNamespace

Namespace: empty

Prefix: empty

Local part: noNamespace

noNamespacenoNamespaceA zero-length string
pre:prefixed

Namespace: http://datypic.com/pre

Prefix: pre

Local part: prefixed

pre:prefixedprefixedhttp://datypic.com/pre
unprefixed

Namespace: http://datypic.com/unpre

Prefix: empty

Local part: unprefixed

unprefixedunprefixedhttp://datypic.com/unpre
@pre:prefAttr

Namespace: http://datypic.com/pre

Prefix: pre

Local part: prefAttr

pre:prefAttrprefAttrhttp://datypic.com/pre
@noNSAttr

Namespace: empty

Prefix: empty

Local part: noNSAttr

noNSAttrnoNSAttrA zero-length string

Suppose you want to create a report on the product catalog. You want to list all the properties of each product element in an HTML list. You could accomplish this by using the query shown in Example 21-2. It uses the local-name function to return the names like name, colorChoices, and desc, allowing them to appear as part of the report.

Example 21-2. Using names as result data

Query

<html>{
  for $prod in doc("catalog.xml")//product
  return (<p>Product # {string($prod/number)}</p>,
          <ul>{
            for $child in $prod/(* except number)
            return <li>{local-name($child)}: {string($child)}</li>
          }</ul>)
}</html>

Results

<html>
  <p>Product # 557</p>
  <ul>
    <li>name: Fleece Pullover</li>
    <li>colorChoices: navy black</li>
  </ul>
  <p>Product # 563</p>
  <ul>
    <li>name: Floppy Sun Hat</li>
  </ul>
  <p>Product # 443</p>
  <ul>
    <li>name: Deluxe Travel Bag</li>
  </ul>
  <p>Product # 784</p>
  <ul>
    <li>name: Cotton Dress Shirt</li>
    <li>colorChoices: white gray</li>
    <li>desc: Our favorite shirt!</li>
  </ul>
</html>

Constructing Qualified Names

There are several ways to construct qualified names. Qualified names are constructed automatically when you are using direct element and attribute constructors. They can also be constructed directly from strings in certain expressions, such as computed element constructors. In addition, three functions are available to construct QNames: the xs:QName constructor, the QName function, and the resolve-QName function.

The xs:QName type has a constructor just like all other built-in simple types. The argument may be prefixed (e.g., prod:number) or unprefixed (e.g., number).

A function called QName can also be used to construct QNames. Unlike the xs:QName constructor, it can be used to generate names dynamically. It accepts a namespace URI and name (optionally prefixed), and returns a QName. For example:

QName("http://datypic.com/prod", "pre:child")

returns a QName with the namespace http://datypic.com/prod, the local part child, and the prefix pre. As with any function call, the arguments are not required to be literal strings. You could just as easily use an expression such as concat("pre:", $myElName) to express the name.

A third option is the resolve-QName function, which accepts two arguments: a string and an element. The string represents the name, which may have a prefix. The element is used to determine the appropriate namespace URI for that prefix. Typically, this function is used to resolve a QName appearing in the content of a document against the namespace context of the element where the QName appears. For example, to retrieve all products that carry the attribute xsi:type="prod:ProductType", you can use a path such as:

declare namespace prod = "http://datypic.com/prod";

doc("catalog.xml")//product[resolve-QName(@xsi:type, .)
                            = xs:QName("prod:ProductType")]

This test allows the value of xsi:type in the input document to use any prefix (not just prod) as long as it is bound to the http://datypic.com/prod namespace.

Other Name-Related Functions

Three functions exist to extract parts of an xs:QName:

local-name-from-QName

Returns the local part of the name as a string

prefix-from-QName

Returns the prefix as a string

namespace-uri-from-QName

Returns the namespace URI

The local-name-from-QName and namespace-uri-from-QName functions are similar to the local-name and namespace-uri functions, respectively, except that they take an atomic xs:QName rather than a node as an argument. If you are working with element or attribute names, it is easier to use the functions for retrieving node names, such as local-name and name.

XQuery also has two other prefix-related functions: in-scope-prefixes and namespace-uri-for-prefix. The in-scope-prefixes function returns a list of all the prefixes that are in scope for a given element, as a sequence of strings. The namespace-uri-for-prefix function retrieves the namespace URI associated with a particular prefix, in the scope of a specified element. Because most processing is based on namespaces rather than prefixes (which are technically irrelevant), these functions are not especially useful to the average query writer.

Working with URIs

Uniform Resource Identifiers (URIs) are used to uniquely identify resources, and they may be absolute or relative. Absolute URIs provide the entire context for identifying the resources, such as http://datypic.com/prod.html. Relative URI references are specified as the difference from a base URI, such as ../prod.html. A URI reference may also contain a fragment identifier following the # character, such as ../prod.html#shirt.

The three previous examples happen to be HTTP Uniform Resource Locators (URLs), but URIs also encompass URLs of other schemes (e.g., FTP, gopher, telnet), as well as Uniform Resource Names (URNs). URIs are not required to be dereferenceable; that is, it is not necessary for there to be a web page or other resource at http://datypic.com/prod.html in order for this to be a valid URI. Sometimes URIs just serve as names. For example, in XQuery, URIs are used as the names of namespaces and collations.

Internationalized Resources Identifiers (IRIs) are an extension of URIs that allow a wider, more international set of characters to appear without being escaped. Generally, the term URI is used in this book (and in the XQuery specification) to mean “URI or IRI.” There are no functions or operations in XQuery that support URIs without also supporting IRIs.

The built-in type xs:anyURI represents a URI reference. Most XQuery functions that accept URIs as arguments call for xs:string values instead, but an xs:anyURI value is acceptable also. This is because of a special type-promotion rule that allows xs:anyURI values to be automatically promoted to xs:string when a string is expected. Most of the URI-related functions return xs:anyURI values, following the philosophy of being liberal in what they accept and specific in what they produce.

Base and Relative URIs

Relative URIs are interpreted relative to an absolute URI, known as a base URI. For example, the relative URI prod.html is useless unless interpreted in the context of an absolute URI. In HTML documents, the base URI is often the URI of the document itself. If an HTML document is located at http://datypic.com/order.html, and it contains a link to prod.html, that prod.html relative URI is resolved in the context of the http://datypic.com/order.html, and the link points to http://datypic.com/prod.html.

Using the xml:base attribute

In XML documents, you can also explicitly specify a base URI using the xml:base attribute. The scope of each xml:base attribute is the element on which it appears and all its content.

Example 21-3 shows an XML document that uses the xml:base attribute on the catalog elements, with relative URI references (the href attributes) for each product. The href="prod443.html" attribute of the first product element, for example, is resolved relative to the xml:base attribute of the first catalog element, namely http://datypic.com/ACC/.

Example 21-3. Document using xml:base (http://datypic.com/input/cats.xml)
<catalogs>
  <catalog name="ACC" xml:base="http://datypic.com/ACC/">
    <product number="443" href="prod443.html"/>
    <product number="563" href="prod563.html"/>
  </catalog>
  <catalog name="WMN" xml:base="http://datypic.com/WMN/">
    <product number="557" href="prod557.html"/>
  </catalog>
</catalogs>

Finding the base URI of a node

The base-uri function can be used to retrieve the base URI of a node. For document nodes, the base URI is the URI from which the document was retrieved. For example:

base-uri(doc("http://datypic.com/input/cats.xml"))

returns http://datypic.com/input/cats.xml.

For element nodes, the base URI is the value of its xml:base attribute, if any, or the xml:base attribute of its nearest ancestor. For example, if $prod is bound to the first product element in cats.xml, the function call:

base-uri($prod)

returns http://datypic.com/ACC/, because that is the xml:base value of its nearest ancestor.

If no xml:base attributes appear among its ancestors, it defaults to the base URI of the document node, if one exists.

Resolving URIs

The resolve-uri function takes a relative URI and a base URI as arguments, and constructs an absolute URI. For example, the function call:

resolve-uri("prod.html", "http://datypic.com/order.html")

returns http://datypic.com/prod.html.

Static base URI

The base URI of an individual node is set by the xml:base attribute or by the document URI. There is also a separate base URI, known as the static base URI. The static base URI is used in several cases:

  • When an element is constructed in a query, its base URI is set to the static base URI, if it is not absent. Otherwise, its base URI is the empty sequence.

  • When relative URI references are used in certain expressions, or in arguments to functions like the doc and collection functions, they are resolved relative to the static base URI.

  • When a base URI argument is not provided to the resolve-uri function, it resolves the URI relative to the static base URI.

The static base URI can be set in the query prolog, using a base URI declaration. Its syntax is shown in Figure 21-1.

Figure 21-1. Syntax of a base URI declaration

Here’s an example of a base URI declaration:

declare base-uri "http://datypic.com";

The base URI must be a literal value in quotes (not an evaluated expression), and it should be a syntactically valid absolute URI.

It is also possible for the processor to set the static base URI outside the scope of the query. Although it is implementation-defined, it’s reasonable to expect that if the query itself is read from a file, the static base URI will default to the location of that file. The value of the static base URI can be retrieved using the static-base-uri function.

Documents and URIs

When accessing an input document using the doc function, a URI is used to specify the document of interest. Processors interpret the URI passed to the doc function in different ways. Some, like Saxon, will dereference the URI, that is, go out to the URL and retrieve the resource at that location. Other implementations, such as those embedded in XML databases, consider the URIs to be just names. The processor might take the name and look it up in an internal catalog to find the document associated with that name.

Finding the URI of a document

You can find the absolute URI from which a document node was retrieved by using the document-uri function. This function is basically the inverse of the doc function. Where the doc function accepts a URI and returns a document node, the document-uri function accepts a document node and returns a URI.

For example, if the variable $orderDoc is bound to the result of doc("http://datypic.com/input/order.xml"), then document-uri($orderDoc) returns "http://datypic.com/input/order.xml".

In most cases, this has the same effect as calling the base-uri function on the document node.

Opening a document from a dynamic value

Most of the examples of the doc function in this book use a hardcoded URI, as in doc("order.xml"). However, suppose you wanted to open the documents referenced in Example 21-3. For example, you want to open the product information page for product number 443. Its relative URI is prod443.html, and its base URI is http://datypic.com/ACC/. To do this, you could use:

let $prod := doc("cats.xml")/catalogs/catalog[1]/product[1]/@href
let $absoluteURI := resolve-uri($prod, base-uri($prod))
return doc($absoluteURI)

which would open the document described by the URI http://datypic.com/ACC/prod443.html.

Escaping URIs

URIs require that some characters be escaped with their hexadecimal Unicode codepoint preceded by the % character. This includes non-ASCII characters and some ASCII characters, namely control characters, spaces, and several others. In addition, certain characters in URIs are separators that are intended to delimit parts of URIs, namely the characters ; , / ? : @ & = + $ [ ] and %. If these delimiter characters must be used in a URI, having a meaning other than as a delimiter, they too must be escaped.

Three functions are available for escaping URI values: iri-to-uri, escape-html-uri, and encode-for-uri. All three replace each special character with an escape sequence in the form %xx (possibly repeating), where xx is two hexadecimal digits (in uppercase) that represent the character in UTF-8. For example, ../édition.html is changed to ../%C3%A9dition.html, with the é escaped as %C3%A9.

The three escape functions vary in which characters they escape:

iri-to-uri

Escapes only those characters that are not allowed in URIs, but not the delimiters ; , / ? : @ & = + $ [ ] or %. It is appropriate for escaping entire URIs.

escape-html-uri

Escapes characters as required by HTML agents. Specifically, it escapes everything except ASCII characters 32 to 126. It is appropriate for URIs that are to be handled by browsers.

encode-for-uri

Is the most aggressive of the three. It escapes all the characters that are required to be escaped in URIs, plus all the delimiter characters. It is appropriate for escaping pieces of URIs, such as filenames, that cannot contain delimiter characters.

Note that none of these functions check whether the argument provided is a valid URI; they simply act on the argument as if it were any string.

Working with IDs

IDs and IDREFs are used in XML to uniquely identify elements within a single document and to create references to those elements. This is useful, for example, to create footnotes and references to them, or to create hyperlinks to specific sections of HTML documents.

Typically, an attribute is used as an ID to uniquely represent the element that carries it. It is also technically possible to use a child element as an ID, but it is discouraged for reasons of compatibility with XML DTDs. The value of an ID must be a valid NCName (an XML name with no colon), which means that it must follow certain rules like starting with a letter or underscore, and not containing spaces.

Attributes named xml:id (in the http://www.w3.org/XML/1998/namespace namespace) are always considered to be IDs. Attributes with other names can also be considered IDs if they are declared to have the built-in type xs:ID in a schema, or ID in a DTD.

Example 21-4 shows an XML document that contains some ID attributes, namely the id attribute of the section element, and the fnid attribute of the fn element. Each section and fn element is uniquely identified by an ID value, such as fn1, preface, or context.

The example assumes that this document was validated with a schema that declares these attributes to be of type xs:ID. The id attributes are not automatically considered to be IDs because they are not in the appropriate namespace. In fact, the name is irrelevant if it is not xml:id; an attribute named foo can have the type xs:ID, and an attribute named id can have the type xs:integer.

The type xs:IDREF is used for an attribute that references an xs:ID. All attributes of type xs:IDREF must reference an ID in the same XML document. A common use case for xs:IDREF is to create a cross-reference to a particular section of a document. The ref attribute of the fnref element in Example 21-4 contains an xs:IDREF value (again, assuming it is validated with a schema or DTD). Its value, fn1, matches the value of the fnid attribute of the fn element.

The type xs:IDREFS represents a whitespace-separated list of one or more xs:IDREF values. In Example 21-4, the refs attribute of secRef is assumed to be of type xs:IDREFS. The first refs attribute contains only one xs:IDREF (context), while the second contains two xs:IDREF values (context and language).

Joining IDs and IDREFs

The id and idref functions allow you to reference elements based on the ID/IDREF relationship.

Given a sequence of IDs, the id function returns the elements whose xs:ID attributes match them. For example, the function call:

doc("book.xml")/id( ("preface", "context") )

returns the first two section elements, because their ID attributes have the values preface and context, respectively.

The idref function returns elements that refer to specified IDs, using either an xs:IDREF or xs:IDREFS attribute. For example, the function call:

doc("book.xml")/idref( ("context", "language") )

returns the refs attributes of the two secRef elements, because each of these attributes is of type xs:IDREFS and contains either context or language or both.

The previous examples used literal strings for the argument. These two functions become even more useful when they are used to link referring elements to referred elements. For example, the expression:

for $child in (doc("book.xml")//section[1]/node())
return if (name($child) = "fnref")
       then concat ("[", string(doc("book.xml")/id($child/@ref)), "]")
       else string($child)

uses the id function to resolve the footnote reference in the first section. It returns:

This book introduces XQuery...
The examples are downloadable [See http://datypic.com.]...

The text that was contained in the fn element now appears where it was referenced using fnref.

Another ID-related function is element-with-id, which behaves identically to the id function when IDs are only contained in attributes, as in the previous examples. However, when an ID is contained in element content, the id function returns that element itself, while the element-with-id function returns its parent.

Constructing ID Attributes

You can create result elements with IDs by using the xml:id attribute in your element constructors. For example, the constructor:

<prod xml:id="{concat('P', $prodNum)}"/>

will create a prod element with an ID attribute that is equal to the letter P concatenated with the value of the $prodNum variable. The value of an attribute named xml:id must be a valid XML name. Any whitespace in its value will be normalized automatically.

Generating Unique ID Values

You can generate unique identifiers for nodes using the generate-id function, which accepts a node and returns a unique identifier for that node. For example, the constructor:

for $prod in doc("catalog.xml")//product
return <div id="{generate-id($prod)}"> ... </div>

will create a div with a unique identifier for each product. This is useful whenever you need to create unique identifiers in your output. The exact value of the ID is implementation-dependent, but it is guaranteed to be a syntactically valid xs:ID value, unique for each node in an input document, and consistent if the function is called multiple times within the execution of a query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.217