Full-Text Search

Search facilities have become an increasingly important (and complex) tool to locate relevant information in the vast amount of data that is now structured as XML, whether it is in large text databases or on the Web itself. Searching is a natural use case for XQuery because of its built-in knowledge of XML structures and its syntax, which can be written by reasonably nontechnical users.

XQuery 1.0 contains some limited functionality for searching text. For example, you can use the contains or matches function to search for specific strings inside element content. However, the current features are quite limited, especially for textual XML documents.

The W3C XQuery Working Group is working on a separate recommendation entitled XQuery 1.0 and XPath 2.0 Full-Text that provides specialized operators for full-text searching. These operators will be additions to the XQuery 1.0 syntax, and they will not be supported by all XQuery implementations.

The Full-Text recommendation, currently a working draft, supports the following search functionality:

Boolean operators

Combining search terms using && (and), || (or), ! (not), and not in (mild not)

Stemming

Finding words with the same linguistic stem, for example, finding both "mouse" and "mice" when the search term is "mouse"

Weighting

Specifying weights (priorities) for different search terms

Proximity and order

Specifying how far apart the search terms may be, and in what order

Scope

Searching for multiple terms within the same sentence or paragraph

Score and relevance

Determining how relevant the results are to the terms searched

Occurrences

Restricting results to search terms that appear a specific number of times

Thesaurus

Specifying synonyms for search terms

Case-(in)sensitivity

Considering uppercase versus lowercase letters either relevant or irrelevant

Diacritics-(in)sensitivity

Considering, for example, accents on characters either relevant or irrelevant

Wildcards

Specifying wildcards in search terms, such as run.* to match all words that start with "run"

Stopwords

Specifying common words to exclude from searches, such as "a" and "the"

An example of a full-text query, taken from the Full-Text recommendation, is shown in Example 22-3.

Example 22-3. Full-text query example

for $b score $s in /books/book[content ftcontains ("web site" weight 0.2)
                                                   && ("usability" weight 0.8)]
where $s > 0.5
order by $s descending
return <result>
          <title> {$b//title} </title>
          <score> {$s} </score>
       </result>

This example uses a familiar FLWOR syntax, but with some additional operators and clauses:

  • The score $s in the for clause is used to specify that the variable $s should contain the relevance score of the results. This variable is then used to constrain the results to those where the score is greater than 0.5, and also to sort the results, with the most relevant appearing first.

  • The ftcontains operator is used to find text containing the specific search terms "web site" and "usability."

  • The && operator is used to find a union of the two search terms, returning only documents that contain both terms.

  • The weight keyword is used to weight the individual search terms.

Some XQuery implementations, such as Mark Logic and eXist, provide special built-in functions and operators to address some of these full-text requirements. These implementations generally do not follow the W3C recommendation because they were implemented before it was a publicly available document.

For more information on the XQuery Full-Text recommendation, see http://www.w3.org/TR/xquery-full-text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.181