All About XPointers

In the beginning of this chapter, I took a look at a link that used an XPointer to locate a specific element in a document; that example looked like this:

<MOVIE_REVIEW xmlns:xlink = "http://www.w3.org/1999/xlink"
    xlink:type = "simple"
    xlink:show = "new"
    xlink:href = "http://www.starpowdermovies.com/reviews.xml#
        xpointer(/child::*[position()=126]/child::*[position()=first()])">
    Mr. Blandings Builds His Dream House
</MOVIE_REVIEW>

You can see the XPointer part here:

xpointer(/child::*[position()=126]/child::*[position()=first()])

This XPointer is appended to the URI I'm using here, following a # character.

You might notice that this XPointer expression looks a lot like the XPath expressions we used in Chapter 13, and with good reason—XPointers are built on XPaths, with certain additions that I'll note here.

Because XPointers are built on XPaths, they have all the power of XPaths. Among other things, this means that you can use an XPointer made up of location steps that target an individual location in a document without having to add any markup to that document. You can also use the id() function to target specific elements if you do want to add ID attributes to those elements.

However, because XPointers extend XPaths, there are some differences. The biggest difference is that because users can select parts of documents using the mouse, if they prefer, XPointers enable you to select points and ranges in addition to the normal XPath nodes. A point is just what it sounds like: a specific location in a document. A range is made up of all the XML between two points, which can include part of elements and text strings.

To support points and ranges, XPointer extends the idea of nodes into locations. Every location is an XPath node, a point, or a range. Therefore, node sets become location sets in the XPointer specification.

How do you create an XPointer? Like XPaths, XPointers are made of location paths that are divided into location steps, separated by the / character. A location step is made up of an axis, a node test, and zero or more predicates, like this:

					axis::node_test[predicate]

For example, in the expression

child::PLANET[position() = 5]

child is the name of the axis, PLANET is the node test, and [position() = 5] is a predicate.

You can create location paths with one or more location steps, such as /descendant::PLANET/child::NAME, which selects all the <NAME> elements that have a <PLANET> parent.

XPointers augment what's available with XPaths, so I'm going to take a look at these three parts—axes, node tests, and predicates—for XPointers now.

XPointer Axes

The XPointer axes are the same as the XPath axes, and we're already familiar with them. Axes tell you which direction you should search and give you a starting position to search from. Here's the list of possible axes:

AxisDescription
ancestorHolds the ancestors of the context node. The ancestors of the context node are the parent of the context node and the parent's parent and so forth, back to and including the root node.
ancestor-or-selfHolds the context node and the ancestors of the context node.
attributeHolds the attributes of the context node.
childHolds the children of the context node.
descendantHolds the descendants of the context node. A descendant is a child, or a child of a child, and so on.
descendant-or-selfContains the context node and the descendants of the context node.
followingHolds all nodes in the same document as the context node that come after the context node.
following-siblingHolds all the following siblings of the context node. A sibling is a node on the same level as the context node.
namespaceHolds the namespace nodes of the context node.
parentHolds the parent of the context node.
precedingContains all nodes that come before the context node.
preceding-siblingContains all the preceding siblings of the context node. A sibling is a node on the same level as the context node.
selfContains the context node.

Although XPointers use the same axes as XPaths, there are some new node tests. We'll take a look at these next.

XPointer Node Tests

Here are the node tests you can use with XPointers, and what they match:

Node TestMatches
*Any element
node()Any node
text()A text node
comment()A comment node
processing-instruction()A processing instruction node
point()A point in a resource
range()A range in a resource

Note in particular the last two—point() and range(). These correspond to the two new constructs added in XPointers, points and ranges, and I'll talk more about them at the end of this chapter.

To extend XPath to include points and ranges, the XPointer specification created the concept of a location, which can be an XPath node, a point, or a range. However, node tests are still called node tests, not location tests; when discussing node tests, the XPointer specification specifically extends the definition of node types to include points and ranges so that node tests can work with those types. For the moment, then, we're stuck with the idea that locations can be XPath nodes, points, or ranges—and that the node types in node tests can also be XPath nodes, points, or ranges. Presumably, this contradiction will be cleared up in the final XPointer recommendation.

XPointer Predicates

XPointers support the same types of expressions as XPaths. As in Chapter 13, these are the possible types of expressions you can use in predicates (refer to Chapter 13 for more information):

  • Node sets

  • Booleans

  • Numbers

  • Strings

  • Result tree fragments

As we saw in Chapter 13, there are functions to deal with all these types in XPath. The XPointer specification supports all those functions and also adds functions to cast subexpressions to the particular types defined in XPath, such as boolean(), string(), text(), and number(). It also adds the function unique(), to enable you to test whether an XPointer locates a single location rather than multiple locations or no locations.

XPointer also makes some additions to the functions that return location sets, and I'll take a look at those functions now.

XPointer Location Set Functions

Four XPointer functions return location sets:

FunctionDescription
id()Returns all the elements with a specific ID
root()Returns a location set with one location, the root node
here()Returns a location set with one location, the current location
origin()Same as here(), except that this function is used with out-of-line links

The id() function is the one we saw in Chapter 13 when discussing XPath. You can use this function to return all locations with a given ID.

The root() function works just like the / character—it refers to the root node (which is not the same as the document node—the root node corresponds to the very beginning of the prolog, while the document node corresponds to the top-level element in the document). The root() function is not actually part of the XPath specification, but the XPointer specification refers to it as if it were. Whether or not it will be included in the final XPointer recommendation is unclear.

The here() function refers to the current element. This is useful because XPointers are usually stored in text nodes or attribute values, and you might want to refer to the current element (not just the current node). For example, you might want to refer to the second previous <NAME> sibling element of the element that contains an XPointer, and you can use an expression like this to do so:

here()/preceding-sibling::NAME[position() = 2]

The origin() function is much like the here() function, but you use it with out-of-line links. It refers to the original element, which may be in another document, from which the current link was activated. This can be very helpful if the link itself is in a linkbase and needs to refer not to the element that the link is in, but the original element from which the link is activated.

You can use the abbreviated XPath syntax in XPointers as well. I'll take a look at a few examples, using planets.xml as the document we'll be navigating:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="planets.xsl"?>
<PLANETS>

    <PLANET>
        <NAME>Mercury</NAME>
        <MASS UNITS="(Earth = 1)">.0553</MASS>
        <DAY UNITS="days">58.65</DAY>
        <RADIUS UNITS="miles">1516</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
        <DISTANCE UNITS="million miles">43.4</DISTANCE><!—At perihelion—>
    </PLANET>

    <PLANET>
        <NAME>Venus</NAME>
        <MASS UNITS="(Earth = 1)">.815</MASS>
        <DAY UNITS="days">116.75</DAY>
        <RADIUS UNITS="miles">3716</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.943</DENSITY>
        <DISTANCE UNITS="million miles">66.8</DISTANCE><!—At perihelion—>
    </PLANET>

    <PLANET>
        <NAME>Earth</NAME>
        <MASS UNITS="(Earth = 1)">1</MASS>
        <DAY UNITS="days">1</DAY>
        <RADIUS UNITS="miles">2107</RADIUS>
        <DENSITY UNITS="(Earth = 1)">1</DENSITY>
        <DISTANCE UNITS="million miles">128.4</DISTANCE><!—At perihelion—>
    </PLANET>

</PLANETS>

Here are a few XPointer examples—note that, as with XPath, you can use the [] operator; here, it extracts a particular location from a location set.

ExampleDescription
PLANETReturns the <PLANET> element children of the context node.
*Returns all element children of the context node.
text()Returns all text node children of the context node.
@UNITSReturns the UNITS attribute of the context node.
@*Returns all the attributes of the context node.
PLANET[3]Returns the third <PLANET> child of the context node.
PLANET[first()]Returns the first <PLANET> child of the context node.
*/PLANETReturns all <PLANET> grandchildren of the context node.
/PLANETS/PLANET[3] /NAME[2]Returns the second <NAME> element of the third <PLANET>element of the <PLANETS> element.
//PLANETReturns all the <PLANET> descendants of the document root.
PLANETS//PLANETReturns the <PLANET> element descendants of the <PLANETS> element children of the context node.
//PLANET/NAMEReturns all the <NAME> elements that have a <PLANET> parent.
.Returns the context node itself.
.//PLANETReturns the <PLANET> element descendants of the context node.
..Returns the parent of the context node.
../@UNITSReturns the UNITS attribute of the parent of the context node.
PLANET[NAME]Returns the <PLANET> children of the context node that have <NAME> children.
PLANET[NAME="Venus"]Returns the <PLANET> children of the context node that have <NAME> children with text equal to "Venus".
PLANET[@UNITS = "days"]Returns all <PLANET> children of the context node that have a UNITS attribute with value "days".
PLANET[6][@UNITS ="days"]Returns the sixth <PLANET> child of the context node, only if that child has a UNITS attribute with value "days". Can also be written as PLANET[@UNITS ="days"][6].
PLANET[@COLOR and @UNITS]Returns all the <PLANET> children of the context node that have both a COLOR attribute and a UNITSattribute.

In XPath, you can locate data only at the node level. That's fine when you're working with software that handles XML data in terms of nodes, such as XSL transformations, but it's not good enough for all purposes. For example, a user working with a displayed XML document might be able to click the mouse at a particular point, or even select a range of XML content. (Note that such ranges might not start and end on node boundaries at all—they might contain parts of various trees and subtrees.) To give you finer control over XML data, you can work with points and ranges in XPointer.

Using XPointer Points

How do you define a point in the XPointer specification? To do so, you must use two items—a node, and an index that can hold a positive integer or zero. The node specifies an origin for the point, and the index indicates how far the point you want is from that origin.

But what should the index be measured in terms of—characters in the document, or number of nodes? In fact, there are two different types of points, and the index value you use is measured differently for those types.

Node-points

When the origin node, also called the container node, of a point can have child nodes (which means that it's an element node or the root node), then the point is called a node-point.

The index of a node-point is measured in child nodes. Here, the index of a node-point must be equal to or less than the number of child nodes in the origin node. If you use an index of zero, the point is immediately before any child nodes. An index of 5 locates a point immediately after the fifth child node.

You can use axes with node-points: A node-point's siblings are the children of the container node before or after the node-point. Points don't have any children, however.

Character-points

If the origin node can't contain any child nodes, only text, then the index is measured in characters. Points like these are called character-points.

The index of a character-point must be a positive integer or zero, and less than or equal to the length of the text string in the node. If the index is zero, the point is immediately before the first character; an index of 5 locates the point immediately after the fifth character. Character-points do not have preceding or following siblings, or children.

For example, you can treat <DOCUMENT> as a container node in this document:

<DOCUMENT>
Hi there!
</DOCUMENT>

In this case, there are nine character-points here, one before every character. The character-point at index 0 is right before the first character, H; the character-point at index 1 is right before the i; and so on.

In addition, you should note that the XPointer specification collapses all consecutive whitespace into a single space, so four spaces is the same as one space when calculating an index for a character-point. Also, you cannot place points inside a start tag, end tag, processing instruction, or comment, or inside any markup.

Creating Points

To create a point, you use the start-point() function with a predicate, like this:

start-point()[position()=10]

Here's an example; say that I wanted to position a point just before the e in the text in Mercury's <NAME> element:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="planets.xsl"?>
<PLANETS>

    <PLANET>
        <NAME>Mercury</NAME>
        <MASS UNITS="(Earth = 1)">.0553</MASS>
        <DAY UNITS="days">58.65</DAY>
        <RADIUS UNITS="miles">1516</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
        <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->
    </PLANET>
    .
    .
    .

In this case, I could use an expression like this to refer to the point right before the character e:

xpointer(/PLANETS/PLANET[1]/NAME/text()/start-point()[position() = 1])

Similarly, I can access the point right before the 6 in the text in Mercury's <DAY> element, 58.65 (which, of course, is text, not a number), this way:

xpointer(/PLANETS/PLANET[1]/DAY/text()/start-point()[position() = 3])

Using XPointer Ranges

You can create ranges with two points, a start point and an end point, as long as they are in the same document and the start point is not after the end point. (If the start point and the end point are the same, the range is collapsed.) A range is all of the XML structure between those two points.

A range doesn't have to be a neat subsection of a document; it can extend from one subtree to another in the document, for example. All you need are a valid start point and a valid end point in the same document.

Creating Ranges

To create a range, you use two location paths, separated with the keyword to in the xpointer() function. For example, here's how to create a range that includes the whole word Mercury in planets.xml:

xpointer(/PLANETS/PLANET[1]/NAME/text()/start-point()[position() = 0] to
/PLANETS/PLANET[1]/NAME/text()/start-point()[position() = 7])

Here's how to create a range that includes the entire text value in Mercury's <RADIUS> element, 1516:

xpointer(/PLANETS/PLANET[1]/RADIUS/text()/start-point()[position() = 0] to
/PLANETS/PLANET[1]/RADIUS/text()/start-point()[position() = 4])

Range Functions

The XPointer specification adds a number of functions to those available in XPath to handle ranges:

FunctionDescription
range-to(location-set)This function takes the locations you pass to it and returns a range that completely covers the location. For example, an element location is converted to a range by returning the element's parent as the origin node, the start point as the number of previous siblings the element has, and the end point as one greater than the start point. In other words, this function is intended to cover locations with ranges.
range-inside(location-set)This function returns a range or ranges covering each location in the argument location set. For example, if you pass an element location, the result is a range that encloses all that is inside the element.
start-point(location-set)This function returns a location set with start points in it. Those points are the start points of ranges that would cover the passed locations. For example, start-point(//PLANET[2]) would return the point immediately before the second <PLANET> element in the document, and start-point(//PLANET) would return a location set of the points just before each <PLANET> element.
end-point(location-set)This is the same as start-point(), except that it returns the corresponding endpoints of the ranges that cover the locations passed to it.

Using String Ranges

The XPointer specification also includes a function for basic string matching, string-range(). This function returns a location set with one range for every nonoverlapping match to the search string. The match operation is case- sensitive.

You can also specify optional index and length arguments to specify how many characters after the match the range should start and how many characters should be in the range. Here's how you use string-range() in general:

string-range(location_set, string, [index, [length]])

Matching an Empty String

An empty string, "", matches to the location immediately before any character, so you can use an empty string to match to the very beginning of any string.


For example, this expression returns a location set containing ranges covering all matches to the word "Saturn":

string-range(/, "Saturn")

To extract a specific match from the location set returned, you use the [] operator. For example, this expression returns a range covering the second occurrence of "Saturn" in the document:

string-range(/, "Saturn")[2]

This expression returns a range covering the third occurrence of the word "Jupiter" in the <NAME> element of the sixth <PLANET> element in a document:

string-range(//PLANET[6]/NAME, "Jupiter")[3]

You can also specify the range you want to return using the index (which starts with a value of 1) and length arguments. For example, this expression returns a range covering the letters er in the third occurrence of the word "Jupiter" in the <NAME> element of the sixth <PLANET> element:

string-range(//PLANET[6]/NAME, "Jupiter", 6, 2)[3]

If you want to locate a specific point, you can create a collapsed (zero-length) range, like this:

string-range(//PLANET[6]/NAME, "Jupiter", 6, 0)[3]

Another way to get a specific point is to use the start-point() function, which returns the start point of a range:

start-point(string-range(//PLANET[6]/NAME, "Jupiter", 6, 2)[3])

Here's an expression that locates the second @ character in any text node in the document and the five characters following it:

string-range(/, "@", 1, 6)[2]

XPointer Abbreviations

Because it's so common to refer to elements by location or ID, XPointer adds a few abbreviated forms of reference. Here's an example; suppose that you wanted to locate Venus's <DAY> element in planets.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="planets.xsl"?>
<PLANETS>

    <PLANET>
        <NAME>Mercury</NAME>
        <MASS UNITS="(Earth = 1)">.0553</MASS>
        <DAY UNITS="days">58.65</DAY>
        <RADIUS UNITS="miles">1516</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
        <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->
    </PLANET>

    <PLANET>
        <NAME>Venus</NAME>
        <MASS UNITS="(Earth = 1)">.815</MASS>
        <DAY UNITS="days">116.75</DAY>
        <RADIUS UNITS="miles">3716</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.943</DENSITY>
        <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->
    </PLANET>
    .
    .
    .

You could do so with this rather formidable expression:

http://www.starpowdermovies.com/planets.xml#
xpointer(/child::*[position()=1]/
child::*[position()=2]/child::*[position()=3])

As you know from Chapter 13, the child:: part is optional in XPath expressions, and the predicate [position() = x] can be abbreviated as [x]. In XPointer, you can abbreviate this still more, omitting the [ and ]. Here's the result, which is fairly compact:

http://www.starpowdermovies.com/planets.xml#1/2/3

When you see location steps made up of single numbers in this way, those location steps correspond to the location of elements.

In a similar way, you can use words as location steps, not just numbers, if those words correspond to ID values of elements in the document. For example, say that I give Venus's <PLANET> element the ID "Planet_Of_Love". (Here I'm assuming that this element's ID attribute is declared with the type ID in a DTD.)

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="planets.xsl"?>
<PLANETS>

    <PLANET>
        <NAME>Mercury</NAME>
        <MASS UNITS="(Earth = 1)">.0553</MASS>
        <DAY UNITS="days">58.65</DAY>
        <RADIUS UNITS="miles">1516</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
        <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->
    </PLANET>

    <PLANET ID = "Planet_Of_Love">
        <NAME>Venus</NAME>
        <MASS UNITS="(Earth = 1)">.815</MASS>
        <DAY UNITS="days">116.75</DAY>
        <RADIUS UNITS="miles">3716</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.943</DENSITY>
        <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->
    </PLANET>
    .
    .
    .

Now you could reach the <DAY> element in Venus's <PLANET> element like this:

http://www.starpowdermovies.com/planets.xml#
xpointer(//child::*[id("Planet_Of_Love")]/child::*[position()=3]

However, there's also an abbreviated version that's much shorter. In this case, I use the fact that you can use an element's ID value as a location step, and the result looks like this:

http://www.starpowdermovies.com/planets.xml#Planet_Of_Love/3

As you can see, this form is considerably shorter.

In this example, I used the id() function; to use that function, you should declare ID attributes so that they have the type ID. However, not all documents have a DTD or schema, so XPointer enables you to specify alternative patterns using multiple XPointers. Here's how that might look in this case, where I specify two XPointers in one location step:

http://www.starpowdermovies.com/planets.xml#
xpointer(id("Planet_Of_Love"))xpointer(//*[@id="Planet_Of_Love"])/3

If the first XPointer, which relies on the id() function, fails, the second XPointer is supposed to be used instead, and that one locates any element that has an attribute named ID with the required value. It remains to be seen how much of this syntax applications will actually implement.

That's it for XLinks and XPointers. As you can see, there's a lot of power here—far more than with simple HTML hyperlinks. However, the XLink and XPointer standards have been proposed for quite a few years now, and there have been practically no implementations of them. Hopefully the future will bring more concrete results.

In the next chapter, I'm going to start looking at some popular XML applications in depth, starting with the most popular one of all: XHTML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.9.7