Chapter 11. A Closer Look at Types

Chapter 2 briefly introduced the use of types in XQuery. This chapter delves deeper into the XQuery type system and its set of built-in types. It explains the automatic type conversions performed by the processors and describes the expressions that are relevant to types, namely type constructors, cast and castable expressions, and instance of expressions.

The XQuery Type System

XQuery is a strongly typed language, meaning that each function and operator is expecting its arguments or operands to be of a particular type. This means, for example, that you cannot perform arithmetic operations on strings, without explicitly telling the processor to treat your strings like numbers. This is similar to some common object-oriented programming languages, like Java and C#. It is in contrast to most scripting languages, like JavaScript, which will automatically coerce values to the appropriate type.

Advantages of a Strong Type System

There are several advantages to a strong type system. One of them is the early and reliable identification of errors in a query. Potential errors in the query can be determined before the query is even executed. For example, if you are trying to double a value that is a string (e.g., a product name), there is probably an error in the query. In addition, a type system allows for the identification of errors in the values of input data. This identification of errors can make queries easier to debug, and results in more reliable queries that are able to handle a variety of input data. This is especially true if schemas are used, because schema types can help identify possible errors. A schema allows the processor to tell you that the product name is a string and that you should not be trying to double it. Based on a schema, the processor can also tell you when you've specified a path that will never return any elements—for example, because of a misspelling or an invalid chain of steps.

Another advantage of a strong type system is optimization. Implementations can optimize performance if they know more about the types of data. This too is especially true if schemas are used, because schema types can help a processor find specific elements. If your schema says that all number elements appear as children of product elements, your processor only has to look in one place for the number elements you have requested in your query. If it knows that there is always only one number per product, it can further optimize certain comparison operations.

A strong type system has its disadvantages, too. One is that it can complicate query authoring, because more attention is being paid to types. For example, if you know you want to treat a numeric value like a string, you have to explicitly cast it to xs:string in order to perform string-related operations. Also, supporting an extensive type system can put a burden on implementers of the standard. This is why the more complex features—schema awareness and static typing—are optional features of the standard that will not be available in all implementations.

Do You Need to Care About Types?

If you do not use schemas, your input data will be untyped. Usually, this means that you, as a query author, do not need to be especially concerned about types. Because of the type conversions described in "Automatic Type Conversions," later in this chapter, the processor will usually "do the right thing" with your data.

For example, you may pass an untyped price element to the round function, or multiply it by two. In these cases, the processor will automatically assume that the content of the price element is numeric, and convert it to a numeric type. Likewise, calling the substring function with a name element will assume that name contains a string.

There is the occasional "gotcha," though. One example is comparing two untyped values using general comparison operators (e.g., < or =). If the values are untyped, they are compared as strings. Therefore, if you compare the untyped price element <price>123.99</price> with the untyped price element <price>99.99</price>, the second will be considered greater because the string value starts with a greater digit. Similarly, order by clauses in FLWORs assume that untyped values are strings rather than numbers. In both of these cases, the prices need to be explicitly converted to numbers in order to be sorted or compared as numbers. Casting is described in "Constructors and Casting," later in this chapter.

With untyped text values, you need to be concerned when using the max and min functions. These two functions treat untyped data as if it is numeric. Therefore, the expression:

max(doc("catalog.xml")//name)

will raise an error. Instead, you need to cast the names to xs:string. One way to do this is to use the string function, as in:

max(doc("catalog.xml")//name/string( ))

If you do use schemas, you will be able to get more of the benefits of strong typing, but you will need to pay more attention to types when writing your query. Unlike some weakly typed languages, XQuery will not automatically convert values of one type to an unrelated type (for example, a string to a number). So, if your schema for some reason declares the price element to be of type xs:string, you will not be able to perform arithmetic operations, or call functions like round, on your price without explicitly casting it to a numeric type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.181