Uniform Resource Identifiers (URIs) are used to uniquely identify resources, and they may be absolute or relative. Absolute URIs provide the entire context for identifying the resources, such as http://datypic.com/prod.html. Relative URI references are specified as the difference from a base URI, such as ../prod.html
. A URI reference may also contain a fragment identifier following the #
character, such as ../prod.html#shirt
.
The three previous examples happen to be HTTP Uniform Resource Locators (URLs), but URIs also encompass URLs of other schemes (e.g., FTP, gopher, telnet), as well as Uniform Resource Names (URNs). URIs are not required to be dereferenceable; that is, it is not necessary for there to be a web page or other resource at http://datypic.com/prod.html in order for this to be a valid URI. Sometimes URIs just serve as names. For example, in XQuery, URIs are used as the names of namespaces and collations.
The built-in type xs:anyURI
represents a URI reference. Most XQuery functions that accept URIs as arguments call for xs:string
values instead, but an xs:anyURI
value is acceptable also. This is because of a special type-promotion rule that allows xs:anyURI
values to be automatically promoted to xs:string
when a string is expected. Most of the URI-related functions return xs:anyURI
values, following the philosophy of being liberal in what they accept and specific in what they produce.
Relative URIs are interpreted relative to an absolute URI, known as a base URI. For example, the relative URI prod.html
is useless unless interpreted in the context of an absolute URI. In HTML documents, the base URI is often the URI of the document itself. If an HTML document is located at http://datypic.com/order.html, and it contains a link to prod.html
, that prod.html
relative URI is resolved in the context of the http://datypic.com/order.html, and the link points to http://datypic.com/prod.html.
In XML documents, you can also explicitly specify a base URI using the xml:base
attribute. The scope of each xml:base
attribute is the element on which it appears and all its content.
Example 20-3 shows an XML document that uses the xml:base
attribute on the catalog
elements, with relative URI references (the href
attributes) for each product
. The href="prod443.html"
attribute of the first product
element, for example, is resolved relative to the xml:base
attribute of the first catalog
element, namely http://example.org/ACC/.
Example 20-3. Document using xml:base (http://datypic.com/cats.xml)
<catalogs> <catalog name="ACC" xml:base="http://example.org/ACC/"> <product number="443" href="prod443.html"/> <product number="563" href="prod563.html"/> </catalog> <catalog name="WMN" xml:base="http://example.org/WMN/"> <product number="557" href="prod557.html"/> </catalog> </catalogs>
The base-uri
function can be used to retrieve the base URI of a node. For document nodes, the base URI is the URI from which the document was retrieved. For example:
base-uri(doc("http://datypic.com/cats.xml"))
returns http://datypic.com/cats.xml.
For element nodes, the base URI is the value of its xml:base
attribute, if any, or the xml:base
attribute of its nearest ancestor. For example, if $prod
is bound to the first product element in cats.xml
, the function call:
base-uri($prod)
returns http://example.org/ACC/, because that is the xml:base
value of its nearest ancestor.
If no xml:base
attributes appear among its ancestors, it defaults to the base URI of the document node, if one exists.
The resolve-uri
function takes a relative URI and a base URI as arguments, and constructs an absolute URI. For example, the function call:
resolve-uri("prod.html", "http://datypic.com/order.html")
returns http://datypic.com/prod.html.
The base URI of an individual node is set by the xml:base
attribute or by the document URI. There is also a separate base URI, known as the base URI of the static context. The base URI of the static context is used in several cases:
When an element is constructed in a query, its base URI is set to the base URI of the static context, if one is defined. Otherwise, its base URI is the empty sequence.
When relative URI references are used as arguments to the doc
and collection
functions, or to functions that accept collations as arguments, they are resolved relative to the base URI of the static context.
When a base URI argument is not provided to the resolve-uri
function, it resolves the URI relative to the base URI of the static context.
The base URI of the static context can be set in the query prolog, using a base URI declaration. Its syntax is shown in Figure 20-1.
Here's an example of a base URI declaration:
declare base-uri "http://datypic.com";
The base URI must be a literal value in quotes (not an evaluated expression), and it should be a syntactically valid absolute URI.
It is also possible for the processor to set the base URI of the static context outside the scope of the query. Although it is implementation-defined, it's reasonable to expect that if the query itself is read from a file, the base URI of the static context will default to the location of that file. The value of the base URI of the static context can be retrieved using the static-base-uri
function.
When accessing an input document using the doc
function, a URI is used to specify the document of interest. Processors interpret the URI passed to the doc
function in different ways. Some, like Saxon, will dereference the URI, that is, go out to the URL and retrieve the resource at that location. Other implementations, such as those embedded in XML databases, consider the URIs to be just names. The processor might take the name and look it up in an internal catalog to find the document associated with that name.
You can find the absolute URI from which a document node was retrieved using the document-uri
function. This function is basically the inverse of the doc
function. Where the doc
function accepts a URI and returns a document node, the document-uri
function accepts a document node and returns a URI.
For example, if the variable $orderDoc
is bound to the result of doc("
http://datypic.com/order.xml
")
, then document-uri($orderDoc)
returns "
http://datypic.com/order.xml
"
.
In most cases, this has the same effect as calling the base-uri
function on the document node.
Most of the examples of the doc
function in this book use a hardcoded URI, as in doc("order.xml")
. However, suppose you wanted to open the documents referenced in Example 20-3. For example, you want to open the product information page for product number 443. Its relative URI is prod443.html
, and its base URI is http://example.org/ACC/. To do this, you could use:
let $prod := doc("cats.xml")/catalogs/catalog[1]/product[1]/@href let $absoluteURI := resolve-uri($prod, base-uri($prod)) return doc($absoluteURI)
which would open the document at http://example.org/ACC/prod443.html.
URIs require that some characters be escaped with their hexadecimal Unicode code point preceded by the % character. This includes non-ASCII characters and some ASCII characters, namely control characters, spaces, and several others. In addition, certain characters in URIs are separators that are intended to delimit parts of URIs, namely the characters ; , / ? : @ & = + $ [ ] and %. If these delimiter characters must be used in a URI, having a meaning other than as a delimiter, they too must be escaped.
Three functions are available for escaping URI values: iri-to-uri
, escape-html-uri
, and encode-for-uri
. All three replace each special character with an escape sequence in the form %xx
(possibly repeating), where xx
is two hexadecimal digits (in uppercase) that represent the character in UTF-8. For example, ../édition.html
is changed to ../%C3%A9dition.html
, with the é
escaped as %C3%A9
.
They vary in which characters they escape:
iri-to-uri
Escapes only those characters that are not allowed in URIs, but not the delimiters ; , / ? : @ & = + $ [ ] or %. It is appropriate for escaping entire URIs.
escape-html-uri
Escapes characters as required by HTML agents. Specifically, it escapes everything except ASCII characters 32 to 126. It is appropriate for URIs that are to be handled by browsers.
encode-for-uri
Is the most aggressive of the three. It escapes all the characters that are required to be escaped in URIs, plus all the delimiter characters. It is appropriate for escaping pieces of URIs, such as filenames, that cannot contain delimiter characters.
Note that none of these functions check whether the argument provided is a valid URI; they simply act on the argument as if it were any string.
3.21.46.92