Like XPath, XPointer is not in itself an XML vocabulary. Rather, it’s meant to be used within the markup in XML documents — most often in XLink or XLink-like situations requiring a URI. This chapter covers the details of coding the various XPointer forms. There are two approaches to defining XPointers as described in the XPointer Framework. Shorthand pointers use a very brief syntax, while scheme-based XPointers use a more complex syntax composed of pointer parts.
In XHTML hyperlinking, as
you know, you can locate a subresource
using a combination of a named anchor (the <a
name="
mybookmark
"/>
sort of tag) and a normal anchor (<a
href="
#mybookmark
">
...).
Notwithstanding the limitations of XHTML subresource hyperlinking,
the XPointer spec’s authors recognized its principal
value: simplicity. Thus, they carried it forward into XPointer,
enhanced slightly for the new standard’s use with
XML documents of any vocabulary. This form of an XPointer is called a
shorthand pointer; it includes neither scheme nor XPath expression,
just the “name” of the target
resource:
name
In an XPointer, as in an XHTML fragment identifier, the pound
sign/hash mark, #
, is not itself part of the
XPointer or other fragment identifier. It merely serves to delimit
the fragment from the full URI preceding it. Section 8.3 at the end of this chapter
addresses this issue more fully.
The value of name
is the value
of an
ID-type attribute assigned to some element in the target resource.
Thus, the shorthand form is in essence a shortcut for the longer
XPointer form:
xpointer(id("name
"))
Consider the following simple XML document:
<gaming_platforms currency="sadly-outdated"> <gaming_platform id="A">Atari</gaming_platform> <gaming_platform id="S">Sega</gaming_platform> <gaming_platform id="SN">Super Nintendo</gaming_platform> <gaming_platform id="P">Pong</gaming_platform> </gaming_platforms>
Assuming the id
attributes are in fact
ID-type attributes, therefore, you could locate
the Pong gaming_platform
element with this simple
XPointer:
P
Chapter 4 described how the XPath
id()
function works, how it depends on ID
attributes having been declared in DTDs, and how it depends on those
DTDs having been processed. XPointer’s shorthand
pointers have the same set of issues, but the XPointer Framework
specification adds one more: in addition to IDs defined in XML 1.0
DTDs, it recognizes IDs defined in the W3C’s XML
Schema vocabulary.
In DTDs, IDs are pretty simple. An ID is plainly identified as an attribute of type ID. The only real problem with IDs is the requirement that a DTD be provided and processed. XML Schema offers a number of different options, including IDs provided as child attributes. This means that, if XML Schema processing took place and a Post-Schema Validation Infoset (PSVI) is available, shorthand pointers must look for IDs in that PSVI.
For more on how XML Schema defines and uses IDs, see XML Schema, by Eric van der Vlist (O’Reilly).
Schema-aware ID processing is also specified for the
element( )
scheme, but is not required for the
xpointer( )
scheme, most likely because it builds
on XPath 1.0, which is not XML Schema-aware.
Scheme-based XPointers follow this general form:
scheme
(schemedata
)...
The ellipsis (...
) indicates that XPointers can
be chained together in sequence.
Each scheme
/schemedata
item in
the chain is referred to as a pointer part; thus, some XPointers
consist of just a single pointer part and some consist of multiple
pointer parts. When multiple pointer parts are used, they may be
delimited from one another with optional whitespace.
You’ll see more information about these chains of
pointers in Section 8.2.8.
The scheme of a pointer part functions
something like the protocol of a URI (such
as http
:, ftp
:,
gopher
:, and so on). Its purpose, said the
previous draft of the spec, is to “[identify] the
particular notation” used by the XPointer;
you’ll probably agree this isn’t an
especially descriptive definition. From the examples provided in the
spec, though, we can come up with a simple definition like: the
scheme tells us what kind of pointer part we’re
dealing with.
A pointer part is typically one of two predefined kinds, denoted by three predefined schemes:
A scheme of xpointer
— easily the most common
scheme — says that this pointer part is to be used in
XPointer’s typical manner: to identify some portion
of an XML document of interest.
A scheme of element
indicates that this pointer
part will identify a portion of an XML document using a
“child sequence” notation for
walking the document tree.
A scheme of xmlns
marks this pointer part as a
prelude to the pointer parts that follow. By itself, it
doesn’t locate any resource at all; it simply
declares a namespace context in which succeeding pointer parts
(within the same scheme-based XPointer) are to be evaluated. More
information on xmlns
-type schemes appears later in
this chapter.
You may also use custom schemes instead of these three predefined kinds. More information on this option is found in Section 8.2.7 later in this chapter.
The schemedata contents of pointer parts vary with their schemes, and the XPointer Framework itself does very little to constrain them. Each scheme specification provides its own set of rules describing how its schemedata is to be interpreted.
When the scheme of a pointer part is xmlns
, the
expr_or_decl
declares the namespace associated
with a particular namespace prefix used in subsequent pointer parts.
This namespace declaration takes the form:
prefix
=namespaceURI
For instance:
xmlns(xsl=http://www.w3.org/1999/XSL/Transform
)[subsequent pointer parts]
asserts that the namespace prefix xsl
: appearing
in the rest of the multipart XPointer is to be associated with the
indicated namespace URI (that is, in this case, the namespace for
XSLT elements and attributes).
You can locate
content without knowing anything at all
about the specific named nodes of a target resource. This XPointer
form, which uses the element( )
scheme and
schemedata known as child sequences, uses a conventional
tree-navigation syntax to locate the nth child
of each succeeding level in the document.
Consider the gaming-platform document again:
<gaming_platforms currency="sadly-outdated"> <gaming_platform id="A">Atari</gaming_platform> <gaming_platform id="S">Sega</gaming_platform> <gaming_platform id="SN">Super Nintendo</gaming_platform> <gaming_platform id="P">Pong</gaming_platform> </gaming_platforms>
To locate the Sega gaming_platform
element, aside
from any other options you can use the element( )
scheme:
element(/1/2)
This simply directs the processor to walk the tree, getting the first
child (that is, the root gaming_platforms
element)
of the root node, and then selecting that child’s
second child (the Sega gaming_platform
element).
Note a few things about XPointers built using the element(
)
scheme. First, they can locate elements only; all other
“children” (such as PIs contained
within the element’s start and end tags) are
effectively invisible. Second — barring some way of resetting the
context in which the child sequence is to be evaluated — the very
first integer in a child sequence will nearly always be 1; this
follows from XML’s well-formedness requirement that
a document have no more than one root element.
As the XPointer spec mentions, while a well-formed XML document must have only one root element, XPointer can be used for locating content in possibly non-well-formed external unparsed entities as well. such entities may have multiple “root” elements, leading to the possibility of a child sequence such as:
/12
/3/7
Third, although it may not be as obvious as with shorthand pointers,
child sequences are also shortcuts for scheme-based XPointers. To
locate the Sega gaming_platform
element as
described above, using element(/1/2)
is
effectively an abbreviated form of the scheme-based XPointer:
xpointer(/*[position()=1]/*[position( )=2])
or, more simply:
xpointer(/*[1]/*[2])
Finally, child sequences are both robust (the simplest ones won’t break at all) and fragile (when they break, they’re liable to break in more or less subtle and difficult-to-diagnose ways).
To understand this last point, consider an XML document such as the following:
<books> <book> <title>XML in a Nutshell</title> <author>Harold & Means</author> </book> <book> <title>DocBook: The Definitive Guide</title> <author>Walsh & Muellner</author> </book> <book> <title>Learning XML</title> <author>Ray</author> </book> <book> <title>HTML & XHTML: The Definitive Guide</title> <author>Musciano & Kennedy</author> </book> <book> <title>Building Oracle XML Applications</title> <author>Muench</author> </book> </books>
Using a child sequence, we could construct an XPointer to the author of the last book, which would look as follows:
element(/1/5/2)
This locates the second child of the fifth child of the first child
of the root node. Note the right-to-left reading of the child
sequence. This is often the simplest way to express in everyday
language what a child sequence points to. Thus, this child sequence
is functionally equivalent to an XPointer using the more robust
xpointer( )
scheme, such as:
xpointer(//author[../title = "Building Oracle XML Applications"])
If, however, the document changes — particularly with the
addition or removal of book
elements — the
child sequence will now point to a different
author
element or, worse, return an empty
location-set altogether. The xpointer( )
approach,
on the other hand, continues to point to the author of that book as
long as a book with that title exists in the document, regardless of
where in the document it is.
(Whether this is desirable, of course, depends on your application’s specific needs. Personally, I’m much more comfortable knowing what I’m pointing to than I am knowing where it’s supposed to be.)
Potential fragility aside, child sequences feature what can be a killer advantage: a processor can simply read only as much of a document as it needs to locate the desired node. Relying on loading the entire document — as other kinds of XPointers must — can make processing very large documents practically infeasible.
Because shorthand pointers — at least, assuming liberal use of ID-type attributes — are so convenient and simple, XPointer provides an option that combines them with child sequences. These open using the same rules for connecting names to ID values as shorthand pointers, followed by a child sequence starting at the element so identified.
Assume the following XML fragment is coded in a vocabulary in which
each attribute named id
has been declared as an
ID-type attribute:
... <brewery id="petes"> <brew> <name>Wicked Ale</name> <alc_pct>5.3</alc_pct> <calories>174</calories> <carbs>17.7</carbs> <plato>13.65</plato> </brew> <brew> <name>Strawberry Blonde</name> <alc_pct>5.0</alc_pct> <calories>160</calories> <carbs>13.6</carbs> <plato>12.05</plato> </brew> <brew> <name>Helles Lager</name> <alc_pct>5.0</alc_pct> <calories>163</calories> <carbs>14.6</carbs> <plato>12.30</plato> </brew> </brewery> ...
Using the element( )
scheme, you could locate the
carbs
element corresponding to Helles Lager this
way:
element(petes/3/4)
Note that this combines the content awareness of a shorthand pointer with the structure awareness of a child sequence and thus avoids some of the problems associated with each.
When the scheme is xpointer
, what appears
within the required parentheses of a
scheme-based XPointer is based on an XPath expression, locating some
subresource within a target resource.
The XPath expression in an xpointer
-type pointer
part is not set off from what surrounds it with
quotation marks. This makes XPointer syntax notably different from
that of XSLT, XPath’s other big
“client.” XPath expressions in XSLT
stylesheets always appear as attribute values and therefore must be
enclosed in quotation marks. (On the other hand, remember that
XPointer will almost never be used by itself; rather, it will be used
to locate a subresource of a resource located by XLink or a similar
standard. Just as in XHTML, these resources as a
whole — URIs — will almost always appear within
quotation marks, as attribute values.)
For example, consider the following simple XML document:
<gaming_platforms currency="sadly-outdated"> <gaming_platform id="A">Atari</gaming_platform> <gaming_platform id="S">Sega</gaming_platform> <gaming_platform id="SN">Super Nintendo</gaming_platform> <gaming_platform id="P">Pong</gaming_platform> </gaming_platforms>
You could locate all gaming_platform
elements
whose names begin with S
using a scheme-based
XPointer such as this:
xpointer(//gaming_platform[starts-with(., "S")]
)
Or you could locate any given gaming_platform
simply by referring to its id
attribute (assuming,
of course, that the attribute by that name is explicitly declared as
an ID-type attribute):
xpointer(id("P")
)
This latter approach is very similar to the shorthand pointers
described earlier. More detailed coverage and examples of the
xpointer( )
scheme appear in Chapter 9.
The XPointer Framework’s
mechanisms are
generic enough that developers can extend XPointer by devising custom
schemes beyond the predefined element, xpointer
and xmlns
. These schemes would be used in locating
subresources in documents of a specific XML vocabulary.
For instance, assume a street-mapping vocabulary in which you might code a document like the following:
<map> <street name="Main" segment="1001_3498" xstart="34.3" ystart="679.2" xend="145.7" yend="1003.0"/> <street name="Main" segment="1001_3499" xstart="145.7" ystart="1003.0" xend="145.7" yend="1372.2"/> </map>
The developers of this vocabulary could adopt the XPointer syntax to their own purposes, enabling an application to locate a particular street (consisting of all segments sharing the same name) with a scheme-based XPointer such as:
streetseg(name("Main"))
where streetseg
is the custom scheme.
Note that what appears within the parentheses following such a custom scheme may or may not be an XPath expression or a namespace URI. The Framework doesn’t constrain schemes or schemedata very much, leaving the meaning and significance of the expression up to the conventions of the application in question.
When an XPointer consists of more than one pointer part, the XPointer-aware processor evaluates the XPointer from left to right. This enables the XPointer to serve either or both of two purposes: failure-proofing the XPointer and/or using namespace contexts in the XPointer.
If the first pointer part has an unrecognized scheme, or results in a resource or subresource error, the processor can fall back on the second; if the second fails, it can fall back on the third, and so on.
This makes XPointer much more robust than its simple XHTML counterpart. Assume the following XHTML hyperlink:
<a href="#speech-para2">
If the current document contains a named anchor whose value is
speech-para2
, all is well; the browser scrolls the
document to place that named anchor at the top of the window. But if
there is no named anchor, the only fallback possible for the browser
is a rather crude one: to align the top of the document at the top of
the window.
An XLink/XPointer solution to this problem might look like the following:
<anchor xlink:href="xpointer(id('speech-para2')) xpointer(id('speech-para3'))"
Thus, the processor would first try to locate an element whose
ID-type attribute has a value of speech-para2
; if
no such element is located, the processor attempts to locate an
element with an ID-type attribute of speech-para3
;
and if that attempt fails, the processor reports
a subresource error.
The other principal reason for
using
a multipart XPointer is to establish namespace contexts for
evaluating XPath expressions in other pointer parts. When an
xmlns
-schemed pointer part is encountered, any
pointer parts to its right may freely use elements and attributes
with the associated namespace prefix. Note that to declare multiple
namespaces, you must use multiple xmlns
pointer
parts; you can’t declare more than one namespace in
a given pointer part.
Consider this example (taken directly from the XPointer xmlns( ) Scheme spec):
<doc> <x:a xmlns:x="http://example.com/foo"> <x:a xmlns:x="http://example.org/bar">This element and its parent are in different namespaces."</x:a> </x:a> </doc>
The following XPointer will fail, not because it fails to locate a
(sub-)resource but because the reference to the
x:a
element can’t be
unambiguously evaluated by the processor:
xpointer(//x:a)
To get around this problem, you’d use a multipart scheme-based XPointer, such as:
xmlns(x=http://example.com/foo)
xpointer(//x:a)
or:
xmlns(x=http://example.org/bar)
xpointer(//x:a)
Note that you need to use an xmlns
pointer part
every time you need to use a namespace-qualified
element or attribute name in a subsequent XPointer expression.
Otherwise, the XPointer processor is unable to resolve namespace
prefixes used in XPath expressions in the XPointer; the processor has
no way, for example, to peek inside the target document to retrieve
the namespace declarations that the latter makes.
One final note here: the spec explicitly says that the prefix used in your pointer parts needn’t match the prefixes used in the resource. In effect, each occurrence of a namespace prefix — both in your XPointers and in a target resource as located by them — behaves as though it were physically replaced by the namespace URI prior to the act of locating the (sub-)resource. Thus, the preceding two examples might just as well be coded:
xmlns(abc
=http://example.com/foo) xpointer(//abc
:a)
and:
xmlns(fershlugginer
=http://example.org/bar) xpointer(//fershlugginer
:a)
For clarity of intent, though, it never hurts to use exactly the same prefixes in an XPointer as appear in the target.
When
using multipart XPointers that declare
namespaces, although it may seem natural to always begin with the
xmlns
pointer part, it’s not a
requirement. In fact, not starting off with the
xmlns
might be less confusing or otherwise
desirable in certain circumstances. For instance:
xpointer(id("JSimpson")) xmlns(mydoc=http://mydoc.com) xpointer(/mydoc:root)
Here, the “fallback” convention for
multipart XPointers says to attempt to locate the element whose
ID-type attribute has a value of JSimpson
; if that
attempt fails, fall back to the alternative: locate the root
mydoc:root
element of the target resource. The
only requirement is that a corresponding xmlns
pointer part must appear to the left of any
pointer part that uses a namespace prefix; the
xmlns
pointer parts need not, however, precede
all other pointer parts.
Also note that succeeding xmlns
parts for the same
prefix override one another. Thus (this is a single complete XPointer
broken over two lines for clarity):
xmlns(w=http://wexample1.com) xpointer(//w:bush) xmlns(w=http//wexample2.com) xpointer(//w:bush)
This attempts to return a location set consisting of all
bush
elements in the
http://wexample1.com
namespace; failing that, the
XPointer falls back and attempts to return a location-set consisting
of all bush
elements in the
http://wexample2.com
namespace. (Remember not to
be confused by the w
: prefix, which may or may not
actually be used in the target document. What counts is the
namespace URI, regardless of the
prefix associated with it.)
You may already
have concluded how to do this, based on a
handful of examples in this chapter. Syntactically, including an
XPointer fragment identifier in a URI is the same as doing so in
XHTML: separate the XPointer from what precedes it using a hash/pound
character, #
, as in these examples (using
scheme-based XPointer, shorthand pointer, and two flavors of the
element( )
scheme, respectively):
http://www.example.com/lucy.xml#xpointer(//character[@castmember="arnaz"])
http://www.example.com/lucy.xml#ricky
http://www.example.com/lucy.xml#element(/1/2/4/3)
http://www.example.com/lucy.xml#element(cast/3)
If the XPointer is locating content in the same document in which the XPointer itself appears, simply prefix the XPointer with a hash, as in:
#xpointer(//character[@castmember="arnaz"])
As a final note, remember a couple of additional considerations when using XPointer in URIs, which I’ve pointed out in this and the previous chapter:
Escape special characters as needed, both to comply with
XPointer’s own constraints and those of the
standards with which XPointer must interoperate. These special
characters include the circumflex (^
) for escaping
unbalanced parentheses, the percent sign (%
),
markup-significant characters such as the less-than sign (left angle
bracket, <
), and spaces, as well as other
characters in non-ASCII encodings.
While XPointer itself does not require the use of quotation marks, XPath expressions used in scheme-based XPointers frequently do. Furthermore, because XPointers in XLink and other hyperlinking contexts are used in attribute values, you need to remain aware of nested-quotation-mark issues in the event that your scheme-based XPointers do use quotation marks of their own (such as in embedded XPath expressions).
3.138.125.139