Predicate filters

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Predicate filters

Location paths are quite indiscriminate. For example, the path 'book/chapter/para' selects all of the paragraphs in all of the chapters. But there is often a need to target a more selective set of elements, and possibly a single element instance.

A predicate filter is used to qualify any step in the path. The list of matches at each step is reduced by asking questions about the nodes in this list. Square brackets, '[' and ']', are used to hold the predicate:

para[....]

The predicate filter holds one or more test expressions. The results of the test (or tests) is a single boolean value, and the selection only succeeds when the value is true.

Position tests

The ' position()' function returns the sequential location of the element being tested. This value can be compared to a desired location. The following example selects only the first paragraph:

para[position() = 1]

This form of query can be abbreviated to just a number. The following test is equivalent to the one above:

para[1]

This number does not refer to the position of the element among all sibling elements, but only among those elements already selected by the pattern (the current 'context list'). In this case, as only Paragraph elements are selected, the position refers to the first Para element in the list. To select a Paragraph element when it is the first child of its parent, a more complex expression is required. The following example illustrates this, and shows that it is necessary to first select all the sibling elements, using '*', then check both the position and the name of each element. In this case, the element is only selected if it is the first in the list, and is also a Paragraph element (see below for the purpose of the 'and' expression):

*[position() = 1 and self::para]

Locating the last sibling, when the number of elements in the list is unknown, can be achieved using the ' last()' function. The following pattern applies to the last paragraph (not to the last sibling, if it just happens to be a paragraph):

para[last()]

The ' count()' function has the same purpose, but requires a parameter to specify the node list to count. Using this function, it is possible to discover how many occurrences of a particular element there are in a document. The following example selects notes that contain a single paragraph:

child::note[count(child::para) = 1]

An element may have an identifier, and this unique value can be used to select the element. The ' id()' function is used for this purpose:

chapter[id("summary")]

The ' name()' function returns the name, as a string, of the node specified as a parameter. When the parameter is '.', it returns the name of the current node.

Contained element tests

The name of an element can appear in a predicate filter, and as normal represents an element that must be present as a child. In the following example, a Note element is only selected if it directly contains a Title element, and makes no distinction between one and many titles present:

note[title]

The content of an element can also be compared to a fixed string value:

note[title="first note"]
   <note>
     <title>first note</title>
     ...
   </note>

Attribute tests

Attributes can be tested. An attribute name is distinguished from an element name using a prefix '@' symbol. The following example selects every paragraph with a Type attribute value of 'secret':

para[@type='secret']

The verbose equivalent of '@' uses the ' attribute::' axis:

para[attribute::type='secret']

Boolean tests

Many expressions, including those described above, are either valid or invalid in a given circumstance. A boolean test is performed, with a result of 'true' or 'false'. The tests shown above are only successful if the expression returns a 'true' value. The ' not()' function can be used to reverse the result, and so greatly extends the number of tests that can be made. For example, all notes except for the third one, a note that does not contain a Title element, and all chapters except for one with a specified identifier, can be selected in this way:

note[not(position() = 3)]

note[not(title)]

chapter[not(id("summary"))]

The ' boolean()' function evaluates an embedded expression and returns a boolean value. All valid numbers except zero are considered to be true, and a node list that contains at least one entry is also true. A text string is true if it contains at least one character. All the following tests return 'true' (assuming, in the last case, that there is at least one Title element):

boolean(3)

boolean("some text")

boolean(title)

It follows that the following expression returns false:

not(boolean(3))

Comparisons also return a boolean result. The most obvious comparison is for equality, using the equals symbol, '='. Examples of such comparisons appeared above. In the following example, when the position of the current Note element matches the value '3', then a 'true' value is returned:

note[position() = 3]

The example above compares two numbers for equality, but it is also possible to compare boolean expressions and strings:

note[title = "first note"]

Testing for non-equality is possible using '!='. For example, to select all but the last note:

note[position() != last()]

Other comparisons can be made that require the expressions to be interpreted as numbers. These are tests for the first expression being greater than the second, using '>', and the other way around, using '<'. By combining symbols, it is also possible to test whether the first expression is greater than or equal to the second one ('>='), or less than or equal to it ('<='). The two examples below are equivalent, as they both filter out the first two Note elements:

note[position() > 2]

note[position() >= 3]

Note that the '<' and '>' symbols are significant in XML. When inserting expressions into an attribute, it is necessary to remember to use '<' and '>' to represent these characters (except when using an XML-sensitive editor, which performs this translation on behalf of the author):

note[position() &gt; 2]

An expression can be divided into separate sub-expressions, and the whole expression can be considered to be true only if all the sub-expressions individually evaluate to true, using an 'and' expression. In the following example, a Note element is selected only if it is preceded by at least two others, and followed by at least one more:

note[position() > 2 and position() < last()]

Alternatively, the whole expression may succeed when at least one of the sub-expressions is true, using an 'or' expression. In the following example, both the second and fourth Note elements are selected:

note[position() = 2 or position() = 4]

Finally, the ' true()' and 'false()' functions simply return a value of true or false respectively.

Strings

String objects can be analysed to discover if they contain specific characters or sub-strings. The 'contains()' function returns 'true' if the string contains the given text. The first parameter is the string to test. The second parameter is the string to find in the first string. In the following example, each Note element is tested to see if it contains the word 'note':

note[contains(text(), "note")]


   <note>This is a note.</note>

Note that, in this example, the string to test is the child node of the Note element, which of course needs to be a text node, as represented by 'text()' in the first parameter. But only the first node is tested. The test will fail if the word is actually in a sub-element, or in another text node that follows a sub-element. The safer way to use this function is to refer to the Note element itself, using '.' (the current node). Although the Note element node does not contain the string, its children may do, and these are analysed too:

note[contains(., "note")]


   <note>This is a <emph>note</emph>.</note>

If the specified text needs to appear at the start of the string, the function ' starts-with()' should be used instead:

note[starts-with(., "Note")]


   <note>Note to myself</note>

The ' string()' function converts an embedded expression into a string. For example, the following test selects Note elements that contain the character '2':

note[starts-with(., string(2))]


   <note>This is note number 2</note>

When converting a number into a string, an invalid number is converted to the string 'NaN' (Not-a-Number), and an infinite value is converted to the string 'Infinity' (negative infinity becomes '-Infinity' and negative zero just becomes '0').

The ' translate()' function converts characters according to a mapping scheme. The first parameter is the string to convert. The second parameter lists the characters to modify in the source text. The third parameter lists the replacement values. One use for this function is to allow case-insensitive text comparisons, as in the following example, which matches both of the Note elements below:

note[starts-with(
        translate(.,
          "abcdefghijklmnopqrstuvwxyz",
          "ABCDEFGHIJKLMNOPQRSTUVWXYZ"),
              "HELLO THERE")]


   <note>Note to myself</note>

   <note>NOTE: ...</note>

Additional characters in the second parameter represent characters that are to be removed from the source string. To convert semicolons to commas, while also removing all existing plus symbols from a string, the following would be used:

translate(., ";+", ",")

A leading or trailing fragment of a string can be extracted, providing that the fragment ends or begins with a given character or sequence of characters. The ' substring-before()' function takes two parameters: the string to extract text from, then the character or characters that terminate the prefix to be extracted. The 'substring-after()' function works in the same way, but extracts text from the end of the string. The following example retrieves just the year from a date:

substring-after( ., "/" )


   <date>12/08/1999</date>

To extract any fragment of a string, the 'substring()' function takes three parameters: the source string, the character offset position and the number of characters to extract:

note[substring(., 9, 5) = "XPath"]


   <note>This is XPath</note>

   <note>This XPath is not a match</note>

When using namespaces (see Chapter 10), element names are separated into two parts: a local part, such as 'h1', and a namespace prefix part, such as 'html', giving a complete name of 'html:h1'. The prefix is mapped to a URL, such as 'http://www.w3.org/Profiles/xhtml1-strict'. The 'namespace-uri()' function returns the URL of the first node in the list that forms its parameter. The 'local-name()' function returns the local part of the name:

*[namespace-uri(.) =
            "http://www.w3.org/Profiles/xhtml1-strict"]


   <html:h1>An HTML Header One</html:h1>
   <html:p>An HTML paragraph.</html:p>


*[local-name(.) = "score"]


   <music:score>Ferde Grofé</music:score>
   <competition:score>57</competition:score>

The ' normalize()' function removes leading and trailing spaces, and reduces a sequence of whitespace characters down to a single space character: The following Note element matches, despite the leading and additional embedded spaces:

note[starts-with(normalize(.), "Hello there")]


   <note>   Hello    there</note>

A number of strings can be concatenated into a single string, using the ' concat()' function, which takes one or more string parameters:

concat("Original string.", " Append this", " and this.")

This can be used with substrings to create fixed-length strings. For example, to ensure that a string is exactly ten characters in length, by padding with spaces if necessary:

substring(concat(node(), "          "), 1, 10)

Finally, the number of characters in a string can be determined using the ' string-length()' function:

note[string-length(.) = 15]


   <note>fifteen letters</note>
   <note>123456789012345</note>

Numbers

Objects can be converted to numbers, using the ' number()' function. Boolean expressions are interpreted as '1' for 'true' and '0' for 'false'. Strings that cannot be interpreted as a valid number are translated to a special default value called 'Not-a-Number' (or 'NaN' for short).

Real numbers can be converted to integers. Using the ' round()' function, the real number is rounded up or down to the nearest integer equivalent. Using the 'floor()' function, the number is rounded down to the nearest integer, so '3.9' becomes '3', and using the 'ceiling()' function, the number is rounded up, so '3.1' becomes '4'.

The '+' and '-' operators may be used, as well as '*' for multiplication, and the following four examples are all equivalent:

note[ 4 ]
note[ 2 + 2 ]
note[ 4 - 2 ]
note[ 2 * 2 ]

The 'mod' operator supplies the remainder of a truncated division. For example, '9 mod 4' returns '1' (there is a remainder of one after dividing nine by four). This feature is useful for selecting alternate items, such as even numbered paragraphs:

para[ position() mod 2 = 0 ]

The ' div' operator returns the divisor. For example, '9 div 4' returns '2', because four goes into nine twice.

Precedence

The operators introduced above ('*', 'mod', 'div', '+', '-', '=', '!=', '<', '>', '<=', '>=', 'and' and 'or') are not processed in a simple left-to-right manner. Some have higher precendence than others.

Starting with the highest, the precedence levels are:

'*' and 'div' and 'mod'
'+' and '-'
'<', '>', '<=' and '>='
'=' and '!='
'and'
'or'

The lower the precedence, the more significant an operator is. Because 'or' is processed last, it is very significant: everything to the left of the 'or' operator is calculated, then everything to the right of it, and finally the two halves are both checked, and the whole expression succeeds if either sub-expression is true.

Similarly, the '+' and '-' operators are always dealt with before '='. For example, the expression '4-1 = 5-2' returns true, because '3 = 3'.

Multiple filters

Multiple predicate filters are used when both an abbreviated position and another type of test need to be combined, because they must not appear together. The following example first selects company names, then extracts the third name in this list:

child::name[company][3]


   <names>
     <name><person>...</person></name>
						<name><person>...</person></name>

     <name><company>...</company></name>
     <name><company>...</company></name>

     <name><person>...</person></name>
						<name><company>...</company></name>
						<name><person>...</person></name>
   </names>

The order in which these two tests are carried out is very important. Only elements that successfully pass the first test are subjected to the second. Reversing the order of the tests in the example above therefore produces a very different result. This time, the third name is selected, providing that it is also a company name:

child::name[3][company]


   <names>
     <name><person>...</person></name>
     <name><company>...</company></name>
     <name><company>...</company></name>
     <name><person>...</person></name>
   </names>

   <names>
     <name><person>...</person></name>
     <name><company>...</company></name>
    <name><person>...</person></name> <!-- NOT SELECTED -->
    </names>

Multiple predicate filters are also useful in other circumstances, although in many cases a single filter can be used that includes the 'and' token instead. The second example above can be reformulated as follows:

child::*[position() = 3 and self::company]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Predicate filters

Create new playlist

Sign In

Sign Up

Predicate filters

Position tests

Contained element tests

Attribute tests

Boolean tests

Strings

Numbers

Precedence

Multiple filters

Table of Contents for
Predicate filters