Concatenating and Splitting Strings

Five functions, summarized in Table 17-6, concatenate and split apart strings.

Table 17-6. Functions that concatenate and split apart strings

Name

Description

concat

Concatenates two or more strings

string-join

Concatenates a sequence of strings, optionally using a separator

tokenize

Breaks a single string into a sequence of strings, using a specified separator

codepoints-to-string

Converts a sequence of Unicode code-point values to a string

string-to-codepoints

Converts a string to a sequence of Unicode code-point values

Concatenating Strings

Strings can be concatenated together using one of two functions: concat or string-join. XQuery does not allow use of concat operators such as +, &, or || to concatenate strings. The concat function accepts individual string arguments and concatenates them together. This function is unique in that it accepts a variable number of arguments. For example:

concat("a", "b", "c")

returns the string abc. The string-join function, on the other hand, accepts a sequence of strings. For example:

string-join( ("a", "b", "c"), "")

also returns the string abc. In addition, string-join allows a separator to be passed as the second argument. For example:

string-join( ("a", "b", "c"), "/")

returns the string a/b/c.

Splitting Strings Apart

Strings can be split apart, or tokenized, using the tokenize function. This function breaks a string into a sequence of strings, using a regular expression to designate the separator character(s). For example:

tokenize("a/b/c", "/")

returns a sequence of three strings: a, b, and c. Regular expressions such as s, which represents a whitespace character (space, line feed, carriage return, or tab), and W, which represents a nonword character (anything other than a letter or digit) are often used with this function. A list of useful regular expressions for tokenization can be found in Appendix A in the discussion of the tokenize function. Table 17-7 shows some examples of the tokenize function.

Table 17-7. Examples of the tokenize function

Example

Return value

tokenize("a b c", "s")

("a", "b", "c")

tokenize("a b c", "s+")

("a", "b", "c")

tokenize("a−b--c", "−")

("a", "b", "", "c")

tokenize("−a−b-", "−")

("", "a", "b", "")

tokenize("a/ b/ c", "[/s]+")

("a", "b", "c")

tokenize("2006-12-25T12:15:00", "[−T:]")

("2006","12","25","12","15","00")

tokenize("Hello, there.", "W+")

("Hello", "there")

Converting Between Code Points and Strings

Strings can be constructed from a sequence of Unicode code-point values (expressed as integers) using the codepoints-to-string function. For example:

codepoints-to-string((97, 98, 99))

returns the string abc. The string-to-codepoints function performs the opposite; it converts a string to a sequence of code points. For example:

string-to-codepoints("abc")

returns a sequence of three integers 97, 98, and 99.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.108.175