Chapter 14

XQuery APIs

14.1 Introduction

In the last few chapters, you read about XQuery 1.0 and XPath 2.0, languages for querying XML documents, collections, and fragments. But a query language cannot stand alone – it must run in some context. You will read in Chapter 15, “SQL/XML,” that the SQL/XML extensions provide an ideal harness for running XQueries in the context of a SQL database. In this chapter, we focus on a more generic XQuery context, the Java context, and describe the proposed JCP standard XQJ.1 The architecture of XQJ follows that of JDBC® (XQJ has been called “JDBC for XQuery”), which in turn follows ODBC. (See Section 14.2.1 for a description of ODBC and JDBC.)

After describing XQJ, we compare it to SQL/XML as an XQuery API. Though most of this chapter is about XQJ in particular, we urge you to consider XQJ (and, to a lesser extent, SQL/XML) as a general template for any future XQuery APIs.

Why an API?

The XQuery language lets you specify a query over an instance of the XQuery Data Model, which produces an instance of the XQuery Data Model. The definition of the XQuery language describes the environment2 in which a query is run – such things as the available documents and collections, the default collection, in-scope variables, and in-scope schema definitions. This context needs to be set up – created and possibly updated – in some way that’s external to XQuery. The input to, and output from, an XQuery need to be defined and processed – in most cases, you want to do something interesting with the result of a query, such as feed it into a report or into some other program. And you want to be able to do all this in some programming language (say, Java). An API (Application Programming Interface) is the interface between a programming language’s statements, expressions, and data types, and some other data model and/or language – in this case, XQuery. An XQuery API is necessary to formulate and execute XQueries from inside a program.

Where?

As an aside, it’s interesting to note that, while XQuery is defined inside the W3C, the XQuery APIs are not. One could argue that a Java API for XQuery should be defined by the W3C (because it’s XQuery), or by the JCP (Java Community Process) community (because it’s Java), or jointly by both standards bodies (because it’s both XQuery and Java). XQJ is actually defined wholly in the JCP community – XQuery is assumed to be a given, untouchable standard around which a Java API is defined.

However, if you look at the names of the people most deeply involved with the W3C’s XQuery and the JCP’s XQJ, you will find there is a great deal of overlap. At the time of writing, Andrew Eisenberg of IBM and Jim Melton of Oracle are the joint spec leads on JSR 225, and they are both also cochairs of the W3C XQuery Working Group. This is not a coincidence.

14.2 Alphabet-Soup Review

Before we look at XQJ in detail, let’s review some of the closely related standards.

14.2.1 ODBC and JDBC

Let’s imagine you have a SQL database, and you want to run a SQL query. You know exactly the query you want to run, and you have typed, say, SELECT title FROM movies WHERE yearReleased = 1985 into a text editor such as Notepad. Now what? To execute the query, you must run some sort of program – perhaps your database vendor supplies one. In Oracle’s case, the simplest program for running queries is SQL*Plus, so you might run the SQL*Plus program, entering a username, password, and connect string (which describes how to connect to a particular database instance, possibly over some network connection) when SQL*Plus asks for them. Then you paste in the query, and you see the results – a list of titles, nicely formatted with column headings, pagination, a row count, etc.

SQL*Plus is fine for running simple queries and eyeballing the results, but in general you want to be able to write your own program that does all the things SQL*Plus does (simplistically) for you.3 You want to specify your own connection details, execute a query, and then process results in whatever programming language you choose. It is possible to call SQL*Plus from your program and interpret its results,4 but in general you want to be able to access the database directly from your program. Most vendors supply an API so that you can do just that – in Oracle’s case, this is OCI,5 the Oracle Call Interface. OCI is a set of database access and retrieval functions, supplied as a dynamic runtime library, that can be linked into your C or C++ program. Your application can call these functions to connect to an Oracle database, manipulate data, and run queries, returning results that can be understood and processed by the calling program. That’s a proprietary Oracle database API.

Now let’s imagine that you want to access data in several databases, and not all of them are Oracle databases. Or perhaps you are writing an application that is database-independent, leaving the choice of underlying storage to your customer. Now you need to write data access routines that can be understood by any SQL database. Enter ODBC, Open DataBase Connectivity, produced by Microsoft (in partnership with Simba) in 1992 to address the SQL Access Group’s requirements for a common Call-Level Interface (CLI). ODBC is based on the CLI specs from X/Open6 and ISO/IEC.7 Think of ODBC as OCI made generic – ODBC today is supported in many programming languages, against almost every SQL database. ODBC works by providing a generic SQL call interface, which is then translated into the API of the target database by an ODBC driver. Thus, you can write a program that includes ODBC calls, plug in an Oracle ODBC driver and the program will run against an Oracle database – plug in a SQL Server ODBC driver and it will run, unchanged, against a SQL Server database, and so on.

Of course, the challenge in defining ODBC was in deciding which features to surface. Should ODBC include only those features that were supported by every SQL database, making available just the “lowest common denominator” of functionality? Or should it include every feature available in every SQL database, guaranteeing that programs could not, after all, be ported to other databases without breaking? ODBC represents a compromise between those two extremes (as will XQJ, we predict).

Today, ODBC8 is extremely widely-used across SQL databases. While the goal of ODBC is to be programming language- (as well as database-) independent, ODBC does use C syntax and semantics, so it is unsuitable for languages such as Java. Enter JDBC …

JDBC9 (which some people believe is an acronym for Java DataBase Connectivity – it isn’t!) is a datasource-independent API to SQL data sources, just like ODBC, but it’s a Java (rather than C or C++) API. JDBC lets you write standard Java code to access data sources, then supply a data source-specific JDBC driver for each data source. It’s also possible to use JDBC with the JDBC-ODBC Bridge, which acts like an ODBC driver for JDBC. The bridge is useful if, for example, there is an ODBC driver, but no JDBC driver, available for some data source that you want to access.

14.2.2 DOM, SAX, StAX, JAXP, JAXB

In Chapter 6, “The XML Information Set (Infoset) and Beyond,” we described the Document Object Model (DOM), and said it was really an API to XML data rather than a data model. The DOM API to XML documents is very popular – for example, it’s used in JavaScript – and the DOM specification defines a language binding for Java (i.e., you can use DOM to access and manipulate XML data, and pass data from XML to Java and back). In the same chapter, we briefly mentioned that DOM is a tree-based (as opposed to an event-based) API. A tree-based API parses an entire XML document and creates a tree structure from it. The API then lets you navigate around the tree. An event-based API parses an XML document, and reports each event (e.g., the start and end of each element) by way of callbacks to the calling program. Obviously, an event-based parser has some footprint advantages – you don’t have to build a complete parse tree to use it. An event-based parser is in some sense a “lower-level” API than a tree-based parser, since you can process the events in any way you want (including building an in-memory tree).10

SAX – the Simple API for XML – is an event-based API for XML, for use with Java and other languages. The SAX specification is in the public domain, but to write a SAX program you will need to obtain a SAX XML parser. The official website for the SAX project11 lists a number of such parsers, including free downloads from Oracle, IBM, and others. To use SAX, you register an event handler to define a callback method for elements, for text, and for comments. SAX is a serial access API, which means you cannot go back up the tree, or rearrange nodes, as you can with DOM. But SAX has a smaller footprint, and is more flexible.

StAX12 – the Streaming API for XML – is a Java pull parsing API. That is, StAX lets you pull the next item in the document as it parses. You (the calling program) decide when to pull the next item (whereas with an event-based parser, it’s the parser that decides when to cause the calling program to take some action). The StAX parser is ideally suited to state-dependent processing, where you want to treat something differently depending on what comes directly before it. StAX also lets you write XML to an output stream, via the cursor-based XMLStreamWriter or the event-based XMLEventWriter.

JAXP13 – the Java API for XML Processing – is a Java API that lets you parse XML with either SAX or DOM, then process the data in Java, and display it in a variety of formats using XSLT. JAXP includes a pluggability layer so you can plug in any SAX or DOM parser, and/or an XSLT processor.

JAXB14 – the Java API for XML Binding – provides a way to bind an XML schema-compliant document into Java objects (a package of classes and interfaces). Once a binding is defined, you can unmarshall an XML document into a Java content tree (a tree of Java objects), and marshall the Java objects back into XML.

14.2.3 Alphabet-Soup Summary

Confused? OK, before we introduce XQJ, let’s summarize these existing standards for accessing and processing data within a programming language.

• Access data from any SQL database (or other table-oriented data source). The data access language is SQL, and the programming language is C or C++ – ODBC, with an ODBC driver for each data source.

• Access data from any SQL database (or other table-oriented data source). The data access language is SQL, and the programming language is Java – JDBC, with a JDBC driver for each data source (or an ODBC driver for the data source + a JDBC-ODBC bridge).

• Parse XML into a tree structure, then navigate and modify that tree – DOM.

• Parse XML, with a callback method at each event you encounter as you parse – SAX.

• Parse XML, where each item is parsed in response to a request from the calling program (pulled) – StAX.

• Parse XML using either DOM or SAX, process the data in Java, output the data using XSLT – JAXP.

Now, suppose you want to write a Java program that can access any database, and access and manipulate XML data. You could use JDBC to access any SQL database and then retrieve an XML document from a row in a SQL table, then cast the object to an XML class defined in JAXP, then use SAX or DOM to parse it, and manipulate the result with JAXP methods. A little clunky, but doable. But how can you query the XML data? Remember, JDBC uses SQL to query the data source. If the data source is a SQL database that understands the SQL/XML extensions (see Chapter 15, “SQL/XML”), you might be able to query the XML data using XQuery that way. Or perhaps you have a mid-tier XQuery engine that will take JAXP classes and query them. Again, doable but clunky. What we have is a set of useful data model APIs for manipulating XML in Java, but there is currently no language API to call XQuery from Java, in the way that JDBC is a language API to call SQL from Java.

What is needed is a Java API that talks XQuery to any XML data source, returning XML data that can be processed in Java. Think of it as JDBC, where the data access language is XQuery rather than SQL, with JAXP/JAXB-like data mappings from the XQuery Data Model into Java classes. The current proposal for such an API is JSR 225, or XQJ. XQJ is sometimes called “JDBC for XQuery” (just as JDBC is sometimes called “ODBC for Java”).

14.3 XQJ – XQuery for Java

XQJ – XQuery for Java15 – was first mentioned in Chapter 8, “Storing: XML and Databases.” The XQJ spec is under construction as JSR 225, part of the Java Community Process.16 The position of spec lead is shared by Oracle and IBM,17 and there is an Expert Group working with the spec leads. At the time of writing, the latest available spec is an Early Draft Review published in May 2004. In our “Alphabet-Soup Review” above, we looked at two kinds of APIs – language APIs such as ODBC and JDBC that serve as a harness for some query language (SQL), and data model APIs such as DOM and SAX that define ways to construct, access, and manipulate some data structure (such as a DOM tree). XQJ has elements of both, providing a Java harness for the XQuery language, and also methods to manipulate XML objects. In addition, XQJ is designed to run anywhere (client, server, or mid-tier).

In the rest of this section we describe what XQJ does, with examples,18 and in the following section we go back to talking about APIs in general. The examples in this section have been tested against an early version of the RI (Reference Implementation) supplied by Oracle as part of the JCP (Java Community Process). Where an import statement is shown, javax.xml.xquery represents the XQJ classes, while oracle.xquery.xqj represents Oracle’s XQJ driver classes. Most examples are code fragments rather than complete examples.

14.3.1 Connecting to a Data Source

The first thing we need to do is connect to a data source, so we have some data to query over. Just like ODBC and JDBC, XQJ has the following concepts.

• A data source – anything that has data in it, and for which you have an XQJ driver. The data source might be a SQL database, an XML database, or a collection of files. By abstracting out the data source, your XQJ Java program will run against any XML data source (with a suitable XQJ driver).

• A connection – the result of connecting to a data source, usually with some username, password, network protocol, etc.

• A session – an instance of a connection, providing some context for variables and for transactions.

Data Source

The XQJ Early Draft Review describes several ways to support a vendor-specific data source connection (via, e.g., an Oracle driver) available to the Java programmer in a general way. The idea is that a vendor supplies a class that implements the XQDataSource interface (an XQJ driver). For example, Oracle’s XQJ driver class might be called OXQDataSource.

• You could write an application that instantiated OXQDataSource every time you want a data source (see Example 14-1). That is the simplest method, but also the least portable – it introduces a dependency on the Oracle XQJ driver into your code wherever you need to define a data source.

• Use Class.forName to map the class name “XQDataSource” to the XQJ driver class name (see Example 14-2). This allows you to set the driver class name to any string, rather than having to hard-code it in your program.

• Use the system

image

and pass in the name of the XQJ driver class on the command line using the –D option.19

• Use the service provider API.20 Specify the fully-qualified class name of the XQJ driver class in a file META-INF/services/javax.xmlquery.XQDataSource.

Example 14-1   Instantiate a Vendor-Specific Data Source, Simplest Method

image

Example 14-2   Instantiate a Vendor-Specific Data Source, Using class.forName

image

For maximum portability, you should use JNDI (Java Naming and Directory Interface)21 to map a logical name to your XQDataSource object. This additional level of naming indirection improves portability, and it improves maintainability – you can make changes to the underlying XQDataSource object without changing any application code.

XQJ currently defines Username, Password, and MaxConnections as properties of an XQDataSource. Additional standard properties may be defined in the future, and vendors are free to add their own properties with the appropriate setter and getter methods. For example, DataDirect takes the approach of defining all data source properties (and apparently connection properties too) in a configuration file, specified by a proprietary data source property setConfigFile.

We understand that the data source definition (and, in particular, ways to specify an XQJ driver in a portable, generic way) is still under discussion in the JSR 225 Expert Group so, now that you have a flavor of what the data source looks like and what it’s for, we’ll move on to the connection.

Connection

Once you have instantiated a data source object, you can create a connection using the XQConnection class. Example 14-3 creates the connection myConnection, given the data source myDatasource created in Example 14-1.

Example 14-3   Create an XQJ Connection

image

Example 14-3 shows the simplest possible connection, just as Example 14-1 shows the simplest possible Oracle data source. This default setup allows you to access files in the local directory. The XQJ driver class and getConnection might have parameters, e.g. username and password. XQJ also allows you to “reuse” a JDBC connection, supplying the JDBC connection as a parameter to getConnection. Example 14-4 creates an XQJ connection myConnection, given the data source myDatasource created in Example 14-1 and some JDBC connection myJdbcConnection. Some of the properties defined in a JDBC connection are data source properties – the JDBC values will override any values defined when you set up the XQJ data source.

Example 14-4   Create an XQJ Connection from a JDBC Connection

image

Session

A session is an instance of a connection object. A session has some corresponding session state, which includes the user ID, a set of XQuery expressions and results, and one or more transactions.

14.3.2 Executing a Query

We have created a data source object, and from that data source (or from a JDBC connection) we have created a connection. Now we can start executing queries.

To execute a query in XQJ, you first need to create an expression object from your connection. In Example 14-5, we create an expression object Expr in the context of connection myConnection, using the createExpression method. For convenience, we then set up a string variable xqueryDirectorLandis that contains the XQuery we want to run. xqueryDirectorLandis is a simple XQuery that evaluates to a sequence of string literals. Note this could be any XQuery expression or an XQueryX document. Finally, we call the executeQuery method of the expression object to run the query and return the results in the variable ResultSeq, of type XQResultSequence.

Example 14-5   Execute an XQuery

image

Prepared Expressions

In general, any query (SQL or XQuery) is evaluated in two phases – static evaluation and dynamic evaluation.

The static evaluation phase involves all the processing that can be done with knowledge of only the syntax of the query and the values of literals (that is, without any knowledge of the values of any variables in the query). Static evaluation typically involves building an internal tree structure representing the parsed query, some optimization of the query, and possibly a computation of an execution plan (which indexes to use, in which order). This is sometimes called compiling a query.

Dynamic evaluation typically means plugging the values of variables into a compiled query, and computing the result.

Static evaluation can consume a significant amount of computing resource – if you want to run the same query many times with different values for its variables, it is more efficient to compile the query once and run the compiled version many times, to save the cost of recompiling each time.

JDBC and ODBC both have the notion of a prepared (precompiled) statement. In XQJ, the analogous notion is a prepared (precompiled) expression, since XQuery is an expression-based language. In the next example (Example 14-6), we run the same query as in Example 14-5, but we run it in two stages – first we prepare it, then execute it. Example 14-6 also introduces the notion of variable binding. Instead of preparing a query that returns the title of each movie directed by Landis, we prepare a query that returns the title of each movie directed by the name in the external variable $name, and bind the value “Landis” to $name at run-time. This is a fairly natural use of a prepared expression – you might want to query for movies directed by Landis many times, but it’s much more likely that you want to run the same query with different values for some bind variable, i.e., query for movies by various directors.

Example 14-6   Execute an XQuery Using a Prepared Expression

image

14.3.3 Manipulating XML Data

Now we know how to make a connection to some data source, and execute an XQuery to retrieve some result. Now, what can we do with that result? Remember that one of the challenges of a language API is to map results data from the query language (XQuery) to the host language (Java).

XQJ has four data types for storing XML results (XQResultSequence, XQResultltem, XQCachedSequence, XQCachedltem). The result of an XQuery is always an instance of the XQuery Data Model (see Chapter 10, “Introduction to XQuery 1.0”), and the XQuery Data Model is defined in terms of sequences and items. So XQJ includes an XQItem and an XQSequence interface to reflect items and sequences, respectively. XQItem and XQSequence are abstractions – they have subinterfaces XQResultxxx22 and XQCachedxxx23 that can be instantiated. XQResultxxx objects are valid only for the lifetime of a session (connection object), while XQCachedxxx objects can persist across connections.

Before we discuss ways of manipulating this XML data and mapping it into types that Java understands, let’s look at the item and sequence interfaces.

XQItem, XQResultltem, XQCachedltem

In the XQJ Early Draft Review, XQItem is not yet fully defined, but it is expected to have methods for the following.

• Retrieve the (XML Schema) type of the item.

• Check the type of the item against some type.

• If the item is an atomic value, retrieve the value of the item (as a java atomic value).

• If the item is a node, retrieve the node.

XQItem has two subinterfaces, XQResultltem and XQCachedltem. XQResultltem has the same methods as XQItem, and is created by calling the getItem( ) method of an XQResultSequence object. XQCachedltem has the same methods as XQItem, and is created either by calling the getltem( ) method of an XQCachedSequence, or by calling the createItem( ) method of a data source or connection.

XQSequence, XQResultSequence, XQCachedSequence

An XQSequence is a sequence of zero or more XQItems, with a current position that points to the current item. An XQSequence is either scrollable or forward-only. In the XQJ Early Draft Review, XQSequence is not yet fully defined, but it is expected to have methods to move the current position to the next item, to get the current item, to close the sequence, and to test whether the sequence is closed. In addition, if the XQSequence is scrollable, it will have methods to navigate around its items (move the current position to the first item, last item, previous item, an absolute or relative position, or before the first / after the last item).

XQSequence has two subinterfaces, XQResultSequence and XQCachedSequence. XQResultSequence has the same methods as XQSequence (plus methods for retrieving a reference to its connection object, and to retrieve and clear warnings). An XQResultSequence is created by calling one of the query execution methods, XQExpression or XQPreparedExpression.

XQCachedSequence inherits all the methods of XQSequence, and is created by calling the createCachedSequence ( ) method of a data source or connection. An XQCachedSequence persists outside the lifetime of a data source or connection, so it has more uses than an XQResultSequence – this is reflected in the additional methods defined for XQCachedSequence. These include methods to insert, remove, and replace items in the sequence, and to insert whole sequences.

Data Mapping

Now we know that we can use XQJ to execute an XQuery and retrieve XQuery Data Model instances (sequences of items). Since the input to (as well as the output from) an XQuery is always an instance of the XQuery Data Model, this also means we can chain XQueries – use the output from one as the input to another. But to be a useful Java API, XQJ also has to allow us to map these sequences and items to objects and data types that Java can handle natively, and it needs to allow us to pass those sequences and items to some of the already-established Java APIs for manipulating XML data. XQJ does both of those!

XQuery to Java Types

Let’s go back to the example we’ve been building up, and look at some ways to use the result of an XQuery in Java. The simplest way is to take the sequence of titles, which we have in an XQResultSequence object, walk down the sequence using the next ( ) method, and apply the getString ( ) method to each title in turn.

Example 14-7   Print Titles

image

Now we have enough to show one complete, working example – Example 14-8.

Example 14-8   Complete, Simple XQuery Using XQJ

image

image

image

Results:

image

In Example 14-8, we effortlessly crossed the boundary from the XQuery data world (where everything is an instance of the XQuery Data Model) to the Java data world (where everything is a Java Object or type). We cheated a little to illustrate the simplest possible data mapping – the XQuery that we ran converted the title nodes (elements) into strings using the string ( ) function as part of the return clause. So the query returned a sequence where each member is an item of an atomic type, the XML Schema type “string.” XQJ then converted each of those to a Java string, via the getstring( ) method.

XQJ defines mappings between the XML Schema atomic types and Java types, in both directions. We don’t reproduce that mapping here since, at the time of writing, it is only loosely defined – we’ll just point out that such a mapping exists, and is based on the mappings in JAXB and JAXP.

What about the more general case, where an XQuery returns a sequence containing nodes as well as atomic types? XQJ provides a way to deal with those nodes using one of the already-established standards for dealing with XML in Java.

XQuery to XQuery

First, you should be able to deal with a general XQuery sequence (containing nodes and atomic values) by feeding it to another XQuery. Example 14-9 shows that – we first run an XQuery to find all movies where the director is “Landis,” then feed the result into another XQuery that pulls out the titles.24 That’s achieved by declaring an external variable in the second XQuery, and binding to it the result sequence from the first XQuery (in the same way we bound a string to a variable in Example 14-8).

Example 14-9   Chaining XQueries

image

image

Now we can chain XQueries, and as long as the last XQuery in the chain returns only atomic values we can map the result to Java values. XQJ also defines ways to handle an XQuery Data Model instance (sequence) in one of the already-established Java standards.

XQuery to DOM, SAX, StAX

We have said (many times) that the input to, and output from, an XQuery are instances of the XQuery Data Model. But any XML document or fragment can be converted to an instance of the XQuery Data Model to be queried, and any Data Model instance can be converted to (serialized as) an XML document or fragment. So it would be nice to be able to take, say, a DOM object and query it with XQuery. It would be equally nice to convert the output from an XQuery into a DOM object for further processing – perhaps you already have some software and tools that can handle DOM objects but not XQuery Data Model instances.

At the time of writing, this area of the XQJ spec is incomplete. The Expert Group seems to recognize the importance of working with DOM, SAX, and StAX – the spec says “The sequence and item interfaces explicitly support XML in [the] form of the DOM data model as specified by org.w3c.dom, the SAX interface, and the StaX interface.” However, these interfaces are not yet defined. Possibly XQJ will also support DOM input to an XQuery by defining a bindxxx( ) method (such as bindDOM( )) to bind a DOM object to an XQuery variable. We await further developments in XQJ with bated breath.

XQuery to Java (Objects)

The XQJ Early Draft Review does mention a general way to handle XQuery output as a Java object, using existing XML-to-Java-object bindings such as JAXB. XQJ provides a pluggable interface called XQCommonHandler to achieve this. A vendor or implementation must provide a class that implements the XQCommonHandler interface, converting the object returned by the getObject( ) method of the result sequence into, say, a JAXB object. Example 14-10 is adapted from an example in the XQJ Early Draft Review.

Example 14-10   A Handler for JAXB

image

In Example 14-10, the handler is passed to each item in the result sequence. You can also pass a handler to a data source or a connection, as a parameter to the setCommonHandler ( ) method of the data source/connection. In addition to JAXB, handlers might be used to deliver nodes and atomic values as DOM4J or JDOM objects.

14.3.4 Static and Dynamic Context

So far, we have described how XQJ executes XQueries and handles XQuery input and output as objects and data types that Java can understand. Another important aspect of an API is to provide a context in which the language executes.

You read in Chapter 10, “Introduction to XQuery 1.0,” that XQuery context is split into the static context (that which is known before the values of any variables is known) and the dynamic context (that which is known at run-time, after variables have been evaluated).

The static context contains useful information that you might expect to “just exist” in your environment, without having to explicitly set it up each time you run a query. This includes the default element namespace, the default function namespace, and the QName and type of all in-scope variables. The XQStaticContext interface lets you retrieve static context information, but not to change it. Note that the static context can be changed by an XQuery prolog – XQStaticContext only lets you see the static context information before any query prolog is processed. XQConnection extends XQStaticContext, so you can query the static context using methods on the connection, as in Example 14-11.

Example 14-11   Querying the Static Context

image

The dynamic context contains information that is only known at run-time, as the query is being evaluated. XQJ allows you to retrieve and change the implicit time zone, to bind a value to the context item (the “.” in XPaths), and to bind a value to an external variable. We have already seen examples of the latter – using bindstring( ) and bindSequence( ) to set the value of a string and a sequence respectively (see Example 14-8 and Example 14-9). The bindxxx( ) methods are part of the XQExpression and XQPreparedEXpression interfaces, which extend the XQDynamicContext interface. We would also expect to be able to retrieve and modify the current date and time, and possibly the list of known documents and collections.

14.3.5 Metadata

The goal of XQJ is to provide access to a wide range of XML data sources – files, SQL databases, XML repositories, plus any XML data source that might be invented in the future. You have seen that XQJ achieves this generality via a driver that can be created (or obtained) for a particular data source and plugged in to XQJ. Since it is impossible to completely standardize the functionality available in all those data sources, we need some way for the application programmer to find out about any particular data source and its contents. XQJ achieves this by defining two kinds of metadata, which we’ll call data source metadata and content metadata.

Data Source Metadata

Data source metadata describes the properties of a data source, such as the product identification (name, version, and so on) and the XQuery features that are supported. An XQJ driver must implement the XQMetaData interface, whose methods return this information, so that programs can find out how to interact with the data source.

Content Metadata

Content metadata describes the objects that exist in any particular data source – schemas, modules, collations, functions, collections, documents, and others. An application needs to know which objects exist in order to construct valid XQueries. For those familiar with JDBC, this information is provided in JDBC by the DatabaseMetaData interface, with methods such as getTables ( ). XQJ will provide similar functionality using, we predict, similar interfaces.

14.3.6 Summary

In this section, we described the XQJ (XQuery for Java) API. The XQJ Early Draft Review spec is somewhat patchy, and so we have had to gloss over some areas, but we have given an overview of what XQJ provides, and also some of the flavor of the API itself. Though it is still early in the spec’s lifetime, we think XQJ will provide an excellent foundation for running XQueries, and manipulating their results, in Java. We also believe that the Early Draft Review spec touches all the areas an API needs, and plays nicely with all existing relevant standards (and therefore tools and implementations) in this area. XQJ provides a way to do the following:

• Connect to any data source via a pluggable driver (following the ODBC/JDBC driver model).

• Run XQueries against a connection, either in a single step or in two steps (static evaluation + dynamic evaluation).

• Bind Java data to XQuery variables, so a Java program can define the input to an XQuery.

• Map XQuery atomic values to Java data, so a Java program can handle the output from an XQuery (as long as that output is a sequence of atomic values).

• Store and copy XQuery output as a Data Model Instance, so a programmer can chain XQueries.

• Use DOM, SAX, and StAX to process input to, and output from, XQueries (though this area of the spec is still under construction).

• Use some supplied handler to convert XQuery output to, say, a DOM or JAXB object.

• Retrieve and, in some cases, change parts of the static and dynamic contexts.

• Retrieve metadata about a data source and its capabilities, so that a program can frame appropriate queries for a wide range of kinds of data sources.

At the time of writing, the JSR 225 Expert Group is meeting regularly and working hard to produce a second draft spec. Oracle is working on an RI (Reference Implementation), and IBM is working on a TCK (Technology Compatibility Kit).25 For further reading, see:

• The JavaOne 2004 paper “XQuery API for Java (XQJ) Technology,” by Jim Melton and Andrew Eisenberg26 (the joint spec leads on JSR 225 / XQJ). Note: some of the examples in this section were adapted from this JavaOne presentation.

• The XQJ online tutorial from DataDirect Technologies that we referenced earlier in this chapter.

• The JSR 225 home page on the Java Community Process website.27

• Or, of course, Google for XQJ.

Be forewarned. This chapter presented XQJ as it was defined in mid-2004. The Expert Group has been busy at work and we would be surprised if there were not very significant – even fundamental – changes in XQJ when it is next seen in public.

14.4 SQL/XML

You will read about SQL/XML in Chapter 15, “SQL/XML,” so we won’t describe it here, but we will take a few lines to describe SQL/XML as an XQuery API. We say in Chapter 15, “SQL/XML” that SQL/XML provides a harness in which to run XQuery in the context of the SQL language, in much the same way as XQJ provides a harness in which to run XQuery in the context of the Java language. Let’s compare SQL/XML to XQJ, using the list of features in Section 14.3.6.

• Connect to any data source – SQL/XML provides a data source via the XML Type. The actual data queried is generally stored in the SQL database, and is available via tables and/or views. There is no need to define any special connection, since there are standard ways to connect to a SQL database (e.g., ODBC).

• Run XQueries – SQL/XML lets you run an XQuery using either XMLQUERY ( ) or XMLTABLE ( ).

• Map between XQuery input/output data and data in the native language – SQL/XML provides a native SQL type, XML, which implements the XQuery Data Model. SQL/XML also defines a mapping between SQL names and types to XQuery names and types. This data type mapping is used in the publishing functions (such as XMLELEMENT( )), which create XML data from SQL data, and in the XMLQUERY ( ) function, which allows you to pass in SQL data to an XQuery.

• Context and metadata – SQL/XML statements run in the SQL context, and so all the SQL context information and metadata is available via SQL standard methods.

So which is better, XQJ or SQL/XML? First, SQL/XML is only useful if you want to use XQuery to query XML data stored in a SQL database (or XML views of data stored in a SQL database). If some of the data sources you want to query are not stored in a SQL database, and you want to use a consistent API across all your data, and your application is written in Java, then you should choose XQJ. If, on the other hand, all your data is stored in a SQL database, then you have a choice between using XQJ and using SQL/XML plus JDBC. Your choice then would probably depend on the tools and skills available to you, and on the relative efficiency of the implementations you are choosing between. It may well be that a particular vendor can execute XQueries more efficiently when they are presented as SQL/XML than when they are presented as XQJ. Then again, XQJ leaves the door open for more flexible query processing – for example, an XQJ driver might operate on both midtier (cached) data and back-end (persistent database) data.

14.5 Looking Ahead

As XQuery becomes more widely accepted, we expect other languages (that is, other than SQL and Java) to have APIs to XQuery, especially Microsoft’s .NET. The initial XQJ work provides an excellent template for XQuery APIs in other programming languages. SQL/XML and XQJ have both shown that it’s possible to leverage a lot of existing work (in standards and in implementations) when defining XQuery APIs. The future looks bright!


1Information about JSR 225: XQuery API for Java (XQJ) can be found at http://www.jcp.org/en/jsr/detail?id=225. JSR 225 (XQJ) is still under construction and may change radically before it becomes a standard.

2As you will read in Section 14.3.4, this environment is split into the static context and the dynamic context.

3SQL*Plus is of course a program, and it uses the Oracle Call Interface (OCI) mentioned below to communicate with the Oracle database.

4One of the authors once wrote a Perl (cgi) program that spawned a process called SQL*Plus, then used a pipe to send queries to, and get results to/from, that process. The Perl caller then parsed the results into an array and processed them further. This can be fun, but is not recommended for enterprise programming projects.

5See http://www.oracle.com/technology/tech/oci/index.html.

6Structured Query Language (SQL), C201 (X/Open CAE Specification) (Reading, U.K.: X/Open Company Ltd., 1992).

7The original SQL/CLI spec is ISO/IEC 9075-3:1995 (E) Call-Level Interface (SQL/CLI). Available at: http://www.nist.fss.ru/hr/doc/mstd/iso/9075-3-95.htm.

8For more on ODBC, see MSDN, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/dasdkodbcoverview.asp. For more on ODBC drivers, see DataDirect, http://www.datadirect.com/products/odbc/index.ssp.

9See Sun’s JDBC page at http://java.sun.com/products/jdbc/.

10For a concise comparison of tree-based and event-based APIs, see http://www.saxproject.org/event.html.

11http://www.saxproject.org/.

12http://www.jcp.org/en/jsr/detail?id=173. Note that, at the time of writing, DOM, SAX, and JAXP are all published standard APIs, while StAX is still under development.

13http://java.sun.com/xml/iaxp/.

14http://java.sun.com/xml/jaxb/. See especially: Scott Fordin, Java Architecture for XML Binding: Executive Summary (July 2003). Available at: http://www.sun.com/software/xml/developers/jaxb/index.xml.

15For an excellent tutorial on XQJ, see: Jonathan Robie, Jonathan Bruce, An XQJ Tutorial: Introduction to the XQuery API for Java. Available at: http://www.datadirect.com/developer/xquery/topics/xqj_tutorial/index.ssp.

16For more on the Java Community Process, see Appendix C: The Standardization Processes.

17The Oracle spec lead is an author of this book.

18At the time of writing, there is no reference implementation (RI) publicly available. Oracle is working on the RI, and the authors were fortunate to have access to an early version of that RI to test some of the examples.

19If you’re using a command line, that is.

20See the JAR specification at http://java.sun.com/j2se/1.3/docs/guide/jar/jar.html.

21http://java.sun.com/products/jndi/.

22XQResultltem, XQResultSequence.

23XQCachedltem, XQCachedSequence.

24Note that we’ve gone back to showing an example snippet – we assume the variables for expressions and results have been initialized, and a connection has been created. See Appendix A: The Example for full example listings.

25From the JCP (Java Community Process) Process Document, available at: http://www.jcp.org/en/procedures/jcp2
Reference Implementation (RI): The prototype or “proof of concept” implementation of a Specification.
Technology Compatibility Kit (TCK): The suite of tests, tools, and documentation that allows an organization to determine if its implementation is compliant with the Specification.

26Jim Melton and Andrew Eisenberg, XQuery API for Java (XQJ) Technology (JavaOne conference, 2004).

27http://www.jcp.org/en/jsr/detail?id=225.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.162.49