Chapter 10. Querying RDF: RDF as Data

RDF as a model for metadata and RDF/XML as a way of serializing the model are interesting, but the power of the specifications lies in our ability to access the data easily, using techniques we’re familiar with from other data models, such as the relational data model discussed in Chapter 6.

It is only natural that techniques used for one data model should be adapted for use with another; so the method for accessing the relational data model, Structured Query Language (SQL), is used in a similar manner with RDF/XML through language techniques such as SquishQL, RDQL, RQL, and others.

Tip

Many of the query languages and schemas mentioned in this chapter are also covered in an online document at http://www.w3.org/2001/11/13-RDF-Query-Rules. In addition, if your interest is more inclined to RDF as data (or to the more logical side of RDF), check out the www-rdf-rules discussion list at http://lists.w3.org/Archives/Public/www-rdf-rules/.

RDF and the Relational Data Model

RDF and the relational data model are both metadata models, so it’s natural to want to see how the one can work with the other. Stanford took a look at different designs of tables for storing RDF data in an online paper located at http://www-db.stanford.edu/~melnik/rdf/db.html. With some differences based on data types and the ability to store multiple models, most of the schemas demonstrated were basically the same—store the model as triples, with or without support for additional information such as namespace or model identifier.

Tip

An up dated document comparing RDBMS and Semantic Web data is located at http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/.

If you look at implementations that store RDF within relational databases, these simple overlay schemas are used, for the most part, by all of them. For instance, Jena gives you a couple of different options in database storage; the first is whether multiple models are supported, and the second is whether a hash is used to generate the identifiers for the resources. However, the basic structure of the database is the same—a table for storing statements, with secondary tables storing literals (which could get quite large), resources, and namespaces.

Siderean Software’s Seamark server (covered in Chapter 15) also uses a basic layout for storing its data, with separate tables for resource and literal and another table pulling together the triples (in addition to specific information about accessing the model). However, other applications, such as Plugged In Software’s Tucana Knowledge Store, use a data storage schema that is built from the ground up based on RDF, and make no use of relational data stores at all.

Tip

Anoth er online white paper that discusses the relational data model and RDF directly is “Relational Databases and the Semantic Web” at http://www.w3.org/DesignIssues/RDB-RDF.html.

Roots: rdfDB QL

One of the earliest persistent data stores for RDF was R.V. Guha’s rdfDB, a database built from the ground up to store RDF data. This database, written in C and primarily tested within a Linux environment, uses a specialized language derived from SQL, a language he called “...a high level SQLish query language,” to manipulate and query RDF data within the database.

Tip

You can download a copy of rdfDB at http://guha.com/rdfdb/. Note that there has been little activity with this database in the last few years; I’m including coverage of it here primarily for historical perspective.

In Guha’s language, you can create a database, insert or delete rows from it, and query it. A row in his language would be an RDF triple, in the format of arc-source-target, somewhat different from N-Triples and other languages that portray an RDF triple as source-arc-target. However, the principles are the same.

For instance, to insert a row, use the following syntax (taken from Guha’s sample session online):

insert into test1 (type DanB Person), (name DanB 'Dan Brickley') </>

If the result is successful, the database returns 0; otherwise, a negative value representing the type of error that occurred with the statement is returned.

The data is queried by forming a select statement that provides a variable or variables for resulting data, a from clause giving the database name, and a where clause made up of triples in the format of arc-source-target, with placeholders in the position of unknown values. Again from the sample he provides at his web site:

select ?x from test1 where (worksFor ?x W3C) (name ?x ?y) </>

The results are returned on separate lines, variables mapped to values:

?x = DanC ?y = 'Dan Connolly'
?x = DanB ?y = 'Dan Brickley'

Though Guha’s rdfDB was the precursor to much of the effort in querying RDF, he hasn’t worked on the database recently. However, others took up the effort he pioneered and have since worked to enhance and improve on it. Among these is the Inkling database and SquishQL, an open source effort that included contributions from Leigh Dobbs, Libby Miller, and Dan Brickley.

Inkling and SquishQL

Unlike rdfDB, written in C in a Linux environment, the Inkling database was written in Java, originally on Linux and Solaris and most recently hosted and tested on Mac OS X, using several Java JDBC classes. Though I’ve tried it only on the Mac OS X environment myself, it should work in other environments that have Java installed. An additional requirement for Inkling is an installation of PostgreSQL, as it uses this database for persistent storage (unlike rdfDB, which manages its own storage).

Tip

You can view documentation and test the Inkling database online at http://swordfish.rdfweb.org/rdfquery/. You can also download source code for Inkling at this site. Note that Inkling uses PostgreSQL for its persistent data store. If you don’t want to install Inkling to your own system, you can also use the online test application, running it against your own persisted RDF/XML documents available on the Web.

Once you’ve downloaded the Inkling installation file, you’ll first need to make sure that you have a database called test created, and that you’ve run the SQL commands contained in the inklingsqlschema.psql file. You’ll also need to set JAVA_HOME. In the Mac OS X environment, JAVA_HOME is set to /Library/Java/Home if you’re using the Java installations that are designed specifically for Mac OS X.

The data structure loaded into the PostgreSQL database is relatively simple—one table containing pointers (hashed values) to the actual values in a second table. A flag specifies if the value is a resource or an actual object. If I have anything to disagree with about this design, it’s the combination of resources and objects in one table. Resource URIs are typically Unicode character strings most likely not more than a few hundred characters or so in length. Objects (literals), though, can be large. My test file used in many of the other examples in this book (http://burningbird.net/articles/monsters1.rdf ) has objects that can be several thousand characters in length. Normally, a better design would have been to separate out the known resources into a separate table or even two tables—one for predicates, one for subjects. However, that’s a personal preference.

You can access several demonstration applications installed with Inkling or the online application. You can also use a set of Java classes that support the application directly. Of particular interest in these is a JDBC driver created specifically for Inkling-formatted data, allowing you to query data using a SquishQL-formatted query whether the data is in PostgreSQL database. However, we’re more interested at this point in the queries, which we’ll focus on in the rest of this section.

Tip

The example file used throughout this chapter is from Example 6-6monsters1.rdf.

The SquishQL supported in Inkling has strong ties to SQL. A simple query is similar to the following:

SELECT ?subject
FROM http://burningbird.net/articles/monsters1.rdf
WHERE (dc::subject ?x ?subject)
USING dc FOR http://purl.org/dc/elements/1.1/

In this query, triples form a where clause, leading with the predicate, followed by subject and then by object. If the query uses a variable as placeholder, all values in that field are returned. For this example, all dc:subject predicates are returned regardless of specific subject or object value.

The query is being made against a file rather than the default database (and can be accessed remotely via a URL), which is noted in the FROM clause. The SELECT clause lists the value or values returned, and the USING clause gives a mapping between the predicate URI and the abbreviation for the URI. It’s important to note that the using clause isn’t a namespace prefix, but a way of providing abbreviations for longer URIs. This could mean a specific namespace but isn’t limited only to namespaces formally identified within the RDF/XML document.

The variables begin with a question mark and consist of characters, with no spaces. Figure 10-1 shows both this query and the output format as given in the Inkling online query application.

Preparing to run a query against the test RDF document
Figure 10-1. Preparing to run a query against the test RDF document

After submitting the form, a second page opens up displaying the results:

The subject is Loch Ness Monster 
The subject is giant squid 
The subject is legends 
The subject is Architeuthis Dux 
The subject is Nessie

You can also make more complex queries. For instance, to find all uses of pstcn:reason associated with movements, rather than with related resources, you can join query triples to return specific predicates for given resources that are themselves identified by other predicates; in this case, a predicate of rdf:type of http://burningbird.net/postcon/elements/1.0/Movement, as shown in Example 10-1.

Example 10-1. Finding all reasons for movements within test RDF/XML document
SELECT ?resource ?value
FROM http://burningbird.net/articles/monsters1.rdf
WHERE (rdf::type ?resource "http://burningbird.net/postcon/elements/1.0/Movement")
      (pstcn::reason ?resource ?value)
USING pstcn FOR http://burningbird.net/postcon/elements/1.0/
      rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#

In this example, the first triple looks for all resources with a given rdf:type of http://burningbird.net/postcon/elements/1.0/Movement. These are then passed into the second triple in the subject field, fine-tuning the reasons returned to those associated with movement resources. In the example, predicates from two namespaces are used, as shown in the using clause. In addition, two values are returned in the select clause and printed out:

The reason for the movement to http://www.dynamicearth.com/articles/monsters1.htm is 
Moved to separate dynamicearth.com domain 
The reason for the movement to http:/burningbird.net/articles/monsters1.htm is 
Collapsed into Burningbird 
The reason for the movement to http://www.yasd.com/dynaearth/monsters1.htm is New 
Article

This combining of triple patterns is known as following one specific path within an RDF model, of node-arc-node-arc-node and so on. You can add additional triple patterns to travel further down the path until you reach the data you’re after, no matter how deeply nested within the model. The key is to use a variable assigned data in one triple pattern—such as a subject or object value—as one of the constraints in the next triple pattern and so on.

In addition to filtering based on triple pattern matching, you can also use more traditional query constraints such as the less-than (<) and greater-than (>) operators and equality (= and ~). All of the comparison operators work with integers except for the string equality operator (~).

In Example 10-2, the string equality operator is used to return a resource from a movement on a specific date.

Example 10-2. Find movement resource where movement occurred on a specific date
SELECT ?resource 
FROM http://burningbird.net/articles/monsters1.rdf
WHERE (rdf::type ?resource "http://burningbird.net/postcon/elements/1.0/Movement")
      (dc::date ?resource ?date)
AND ?date ~ "1999-10-31:T00:00:00-05:00"
USING pstcn FOR http://burningbird.net/postcon/elements/1.0/
      rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
      dc FOR http://purl.org/dc/elements/1.1/

The example just shown is a variation of about the most complex query you’ll see with RDF, regardless of specific query language. Variations of the queries just add additional constraints, namespaces, sources (such as multiple documents), and so on. But the basic structure given in the following remains the same:

SELECT variables
FROM source
WHERE (triple clause)
USING namespace mapping

The type of query language demonstrated, beginning with rdfDB and continuing with SquishQL, is the one that’s formed the basis of one of the more popular RDF/XML query languages, RDQL, demonstrated in the next section.

RDQL

The RDQL language is based on the earlier work of Guha’s RDFDB QL and SquishQL, with some relatively minor differences. Its popularity is ensured because of its use within Jena, probably the most widely used RDF API.

RDQL supports the different clauses of select, from, where, and using (with some exceptions) as SquishQL. Additionally, RDQL can change based on the implementation and whether you’re using a Java API such as Jena, a PHP class such as the PHP XML classes, or a Perl module such as RDFStore. However, though the syntax varies within the clauses, the concepts remain the same.

Variables are in the format of a question mark, followed by other characters, just as in SquishQL:

?<identifier>

However, one difference between SquishQL and RDQL occurs in the select clause, which requires commas rather than spaces to separate all variables.

The from, or source, clause, can be omitted with RDQL depending on the implementation. For instance, in Jena, the source of the RDF/XML can be specified and loaded separately through a separate class method or can be given directly in the query. However, in the PHP RDF/XML classes, the from clause must be provided within the query. The same applies to RDFStore, which also requires that the URL be surrounded by angle brackets.

The where clause (or triple pattern clause) differs in that the pattern follows the more traditional subject-predicate-object ordering, and URIs are differentiated from literals by being surrounded by angle brackets. However, the way that triple patterns are combined to form more complex queries is the same in RDQL and SquishQL.

RDQL has greater sophistication in incorporating comparison semantics with the triple pattern within the constrain clause. The use of AND is the same, but other operators — such as the OR operator (|||), bitwise operators (& and |), and negation (!) — are supported.

Within Jena, there is no using clause because the namespaces for the resources are included with the resource rather than being listed as a separate namespace. However, the PHP XML classes support using, as does RDFStore.

Jena’s RDQL and the Query-O-Matic

In addition to the rich set of Java classes that allow access to individual triples as well as the ability to build complex RDF/XML documents (as described in Chapter 8) Jena also provides specialized classes for use with RDQL:

Query

The Query class manages the actual query, enabling the building of a query through an API or passed as a string.

QueryExecution

Query engine interface.

QueryEngine

The actual execution of the query (the intelligence behind the query process).

QueryResults

The iterator that manages the results.

ResultBinding

Mapping from variables to values.

In addition to these standard classes, newer implementations of Jena also support some newer classes, such as a QueryEngineSesame class, which works against the Sesame RDF repository (discussed at the end of the chapter).

The use of the classes is very straightforward. Use Query to build or parse the query, which is then passed to QueryEngine for processing. The results are returned to the QueryExecution class, which provides methods to access the results, which are assigned to QueryResults. To access individual items in the results, the data is bound to program variables using ResultsBinding.

To demonstrate how Jena works with RDQL, I created a dynamic query application, which I call the Query-O-Matic, building it in Java as a Tomcat JSP application.

The Query-O-Matic

The Query-O-Matic is a two-page application, with the first HTML page containing a form and the second JSP page processing the form contents. It’s built using Jena 1.6, and managed with Tomcat. The source code is included as part of the example code for the book.

Tip

The Query-O-Matic does require that you have knowledge of Tomcat and JSP-based applications. If you don’t, you can still work with the code, but you’ll need to provide a different interface for it. You can get more details about Jena’s RDQL support in the RDQL tutorial at http://www.hpl.hp.com/semweb/doc/tutorial/RDQL/index.html.

To create the application, the Jena .jar files must be copied to the common library or to the application-specific WEB-INF lib directory. I copied them to the common library location because I use Jena for several applications.

The first page is nothing special, an HTML form with three fields:

  • The first field is a text input field to hold the URL of the RDF/XML document.

  • The second field is a textarea to hold the actual query.

  • The third field is another text input file to hold the variable that’s printed out.

Figure 10-2 shows the page containing the form, as well as links to sample RDF/XML documents.

Form to capture RDQL parameters
Figure 10-2. Form to capture RDQL parameters

In the JSP page, the form values are pulled from the HTTP request. The URL is used to load the document; once it is loaded, the query is run against the document using the Jena QueryEngine class. To iterate through the results, another class, QueryResults, is created, and each record returned from the query is then bound to a specific object, in order to access a specific value. The result value that’s passed from the form is polled from the object and the value is printed out, as shown in Example 10-3. Once all values are processed, the result set is closed.

Example 10-3. Java/JSP code to dynamically process RDQL query using Jena
<html>
<%@ page import="com.hp.hpl.mesa.rdf.jena.mem.*,
                 java.io.File,
                 java.util.*,
                 com.hp.hpl.mesa.rdf.jena.model.*,
                 com.hp.hpl.mesa.rdf.jena.common.*,
                 com.hp.hpl.jena.util.*,
                 com.hp.hpl.jena.rdf.query.*,
                 com.hp.hpl.jena.rdf.query.parser.*" %>

<body>

<%
   ModelMem model;

   try {
   model = new ModelMem(  );
   String sUri = request.getParameter("uri");
   String sQuery = request.getParameter("query");
   String sResult = request.getParameter("result");
 
   model.read(sUri);

   // query string
   Query query = new Query(sQuery);

   query.setSource(model);

   QueryExecution qe = new QueryEngine(query) ; 
   QueryResults results = qe.exec(  );
   out.print("<h1>test</h1>"); 

	for ( Iterator iter2 = results ; iter2.hasNext(  ) ; ) { 
		ResultBinding env = (ResultBinding)iter2.next(  ) ; 
                Object obj = env.get(sResult);
                out.print(obj.toString(  ));
                out.print("<br>"); 
	} 

   // close results
   results.close(  ) ;
   }
   catch (Exception e) {
     out.print(e.toString(  ));
   }

   
%>
<br>
</font>
</body>
</html>

Once the two pages and supporting Jena .jar files are installed into Tomcat, we’re ready to try out some RDQL in the Query-O-Matic.

Trying out the Query-O-Matic

The simplest test of the Query-O-Matic is to run an RDQL variation of the first query made with Inkling/SquishQL, which is to find all the dc:subject predicates in the RDF/XML document and print out the associated object values. The contents of the form are given in Example 10-4.

Example 10-4. RDQL query to find dc:subject in RDF/XML document
uri: http://burningbird.net/articles/monsters1.rdf
query: SELECT ?subject
            WHERE (?x, <dc:subject>, ?subject)
            USING dc FOR <http://purl.org/dc/elements/1.1/>
result: subject

Comparing this with the SquishQL example shows that both are basically the same with minor syntactic differences. When the form is submitted and the query processed, the results returned are exactly the same, too.

Another slightly more complicated query is shown in Example 10-5, which demonstrates traversing two arcs in order to find a specific value.

Example 10-5. More complex query traversing two arcs
SELECT ?value
WHERE (?resource, <rdf:type>, <pstcn:Movement>),
(?resource, <pstcn:reason>, ?value)
USING pstcn FOR<http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

Notice that object values that are resources are treated the same as the subject and predicate values, with angle brackets around the URI (or the QName). The only type of value that doesn’t have angle brackets is literals.

A slightly more complicated query more fully demonstrates the filtering capability of the triple pattern. To better understand how this query works, take a look at the N-Triples of the statements of the subgraph from the monsters1.rdf example:

<http://burningbird.net/articles/monsters1.htm> <http://www.w3.org/1999/02/22-rdf-
syntax-ns#type> <http://burningbird.net/postcon/elements/1.0/Resource> .
<http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/
elements/1.0/presentation> _:jARP10030 .
_:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10032 .
_:jARP10032 <http://burningbird.net/postcon/elements/1.0/type> "logo" .
_:jARP10032 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird.
net/mm/dynamicearth.jpg" .
_:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10031 .
_:jARP10031 <http://burningbird.net/postcon/elements/1.0/type> "stylesheet" .
_:jARP10031 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird.
net/de.css" .

These are the statements we’ll be querying with the code shown in Example 10-6. Within the query, the pstcn:presentation arc is followed from the main resource (monsters1.htm) to get the object/resource for it (a blank node). Then, the pstcn:requires predicate arc is followed to get the two required presentation bnodes. However, we’re interested only in the one whose pstcn:type is "stylesheet". Once we have that, then we’ll access the value of the stylesheet. The path I just highlighted in the text is also highlighted in the example.

Example 10-6. Using triple pattern as a filter
SELECT ?value
WHERE (?x, <pstcn:presentation>, ?resource),
(?resource, <pstcn:requires>, ?resource2),
(?resource2, <pstcn:type>, "stylesheet"),
(?resource2, <rdf:value>, ?value)
USING pstcn FOR       <http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

The result from running this query is:

http://burningbird.net/de.css

Exactly what we wanted to get.

I used a triple pattern to find the specific required presentation resource, rather than a conditional filter, because I wasn’t going to be querying among the end values—I’m actually modifying the query within the path to the end statement. If I wanted to find specific values using a conditional filter, I would list triple patterns up until I returned all of the statements of interest and then use the filter on these statements to find specific values.

A demonstration of this is shown in Example 10-7, where a date is returned for a movement with movement type of "Add". Notice that equality is denoted by the eq operator rather than using nonalphabetic characters such as ==, common in several programming languages.

Example 10-7. Returning date for movement of type “Add”
SELECT ?date
WHERE 
(?resource, <rdf:type>, <pstcn:Movement>),
(?resource, <pstcn:movementType>, ?value),
(?resource, <dc:date>, ?date)
AND (?value eq "Add")
USING pstcn FOR       <http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
      dc for <http://purl.org/dc/elements/1.1/>

Regardless of the complexity of the query, the Query-O-Matic should be able to process the results. Best of all, you can then take the query and add it to your own code and know that it’s been pretested.

However, if you’re not a big fan of Java, then you may be interested in the PHP version of Query-O-Matic, Query-O-Matic Lite.

PHP Query-O-Matic Lite

If you’ve worked with PHP and with XML, then you’re familiar with the PHP XML classes. These classes provide functionality to process virtually all popular uses of XML, including RDF/XML. The two packages of interest in this chapter are RDQL and RDQL_DB.

Tip

The PHP XML cla ss main web page is at http://phpxmlclasses.sourceforge.net/. This section assumes you are familiar with working with PHP.

As you can imagine from the package names, RDQL provides RDQL query capability within the PHP environment, and RDQL_DB provides persistent support for it. They’re both so complete that the PHP version of Query-O-Matic (Lite) took less than 10 lines of code, hence the Lite designation. But before we look at that, let’s take a close look at the classes themselves.

There are four classes within the RDQL package, but the one of interest to us is RDQL_query_document. This class has one method, rdql_query_url, which takes as a string a contained query string and returns an array of associative arrays with the results of the query. The RDQL_DB package provides two classes of particular importance to this chapter: RDQL_db, which controls all database actions, and RDQL_query_db, which acts the same as RDQL_query_document, taking a string and returning the results of a query as an array of results. RDQL_DB makes use of RDQL for query parsing and other shared functionality.

To use RDQL_DB, you’ll need to preload the database structure required by the package. This is found in a file called rdql_db.sql in the installation. At this time, only MySQL is supported, and the file is loaded at the command line:

mysql databasename < rdql_db.sql

Tip

You must, of course, have the ability to modify the database in order to create tables in it. Follow the MySQL documentation if you have problems loading the RDQL tables.

The RDQL table structure is quite simple. Two tables are created: rdf_data contains columns for each member of an RDF triple as well as information about each, and rdf_documents keeps track of the different RDF/XML documents that are loaded into the database. Unlike the PHP classes discussed in Chapter 9, the PHP RDQL and RDQL_DB packages provide functionality to parse, load, and persist existing RDF/XML documents and to use RDQL to query them, but neither provides functionality to modify or create an RDF/XML document.

At the time of this writing, the PHP XML classes had not been updated to include the new RDF/XML constructs. Because of this, the example RDF/XML document used for most of the book, monsters1.rdf, can’t be parsed cleanly. Instead, another RDF/XML document was used. This document is reproduced in Example 10-8 so that you can follow the demonstration more easily.

Example 10-8. Resume RDF/XML document
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:bbd="http://burningbird.net/resume/elements/1.0/"
  xml:base="http://burningbird.net/shelley_powers/resume/" >

  <rdf:Description rdf:about="http://burningbird.net/shelley_powers/">
     <bbd:bio rdf:resource="bio"/>
     <bbd:job rdf:resource="job" />
     <bbd:education rdf:resource="education" />
     <bbd:experience rdf:resource="experience" />
     <bbd:skills rdf:resource="skills" />
     <bbd:references rdf:resource="references" />
     
  </rdf:Description>

  <rdf:Description rdf:about="bio">

     <bbd:firstname>Shelley</bbd:firstname>
     <bbd:lastname>Powers</bbd:lastname>
     <bbd:city>St. Louis</bbd:city>
     <bbd:state>Missouri</bbd:state>
     <bbd:country>US</bbd:country>
     <bbd:homephone> - </bbd:homephone>
     <bbd:mobile> - </bbd:mobile>
     <bbd:workphone> - </bbd:workphone>
     <bbd:email>[email protected]</bbd:email>
   </rdf:Description>

  <rdf:Description rdf:about="job">
     <bbd:position>Software Engineer</bbd:position>
     <bbd:position>Technical Architect</bbd:position>
     <bbd:experience>16+ years</bbd:experience>
     <bbd:permorcontract>Contract</bbd:permorcontract>
     <bbd:start>2002-09-29</bbd:start>
     <bbd:relocate>No</bbd:relocate>
     <bbd:travel>yes</bbd:travel>
     <bbd:location>St. Louis, Missouri</bbd:location>
     <bbd:status>full</bbd:status>
     <bbd:rateusdollars>100</bbd:rateusdollars>
     <bbd:unit>hour</bbd:unit>
     <bbd:worklocation>both</bbd:worklocation>
     <bbd:idealjob>I'm primarily interested in contract positions with a 
                   fairly aggressive schedule; I like to be in an energetic 
                   environment. My preferred work is technology architecture, 
                   but I'm also a hands-on senior software developer.
     </bbd:idealjob>
     
   </rdf:Description>

  <rdf:Description rdf:about="education">
      <rdf:_1>
        <rdf:Description rdf:about="degree1">
          <bbd:degree>AA</bbd:degree>
          <bbd:discipline>Liberal Arts</bbd:discipline>
          <bbd:date>1981-06-01</bbd:date>
          <bbd:gpa>3.98</bbd:gpa>
          <bbd:honors>High Honors</bbd:honors>
          <bbd:college>Yakima Valley Community College</bbd:college>
          <bbd:location>Yakima, Washington</bbd:location>
        </rdf:Description>
      </rdf:_1>
      <rdf:_2>
        <rdf:Description rdf:about="degree2">
          <bbd:degree>BA</bbd:degree>
          <bbd:discipline>Psychology</bbd:discipline>
          <bbd:date>1986-06-01</bbd:date>
          <bbd:gpa>3.65</bbd:gpa>
          <bbd:honors>Magna cum laude</bbd:honors>
          <bbd:honors>Dean's Scholar</bbd:honors>
          <bbd:college>Central Washington University</bbd:college>
          <bbd:location>Ellensburg, Washington</bbd:location>
        </rdf:Description>
      </rdf:_2>
      <rdf:_3>
        <rdf:Description rdf:about="degree3">
          <bbd:degree>BS</bbd:degree>
          <bbd:discipline>Computer Science</bbd:discipline>
          <bbd:date>1987-06-01</bbd:date>
          <bbd:gpa>3.65</bbd:gpa>
          <bbd:college>Central Washington University</bbd:college>
          <bbd:location>Ellensburg, Washington</bbd:location>
        </rdf:Description>
      </rdf:_3>
  </rdf:Description>


  <rdf:Description rdf:about="experience">
     <rdf:_1>
        <rdf:Description rdf:about="job1">
           <bbd:company>Boeing</bbd:company>
           <bbd:title>Data Architect</bbd:title>
           <bbd:title>Information Repository Modeler</bbd:title>
           <bbd:title>Software Engineer</bbd:title>
           <bbd:title>Database Architect</bbd:title>
           <bbd:start>1987</bbd:start>
           <bbd:end>1992</bbd:end>
           <bbd:description>
At Boeing I worked as a developer for the Peace Shield Project (FORTRAN/Ingres on VAX/
VMS).  Peace Shield is Saudi Arabia's air defense system. At the end of the project, I 
moved into a position of Oracle DBA and provided support for various organizations.  I 
worked with Oracle versions 5.0 and 6.0, and with SQL Forms, Pro*C, and OCI. I was also 
interim information modeler for Boeing Commercial's Repository, providing data modeling 
and design for this effort.
From the data group, I moved into my last position at Boeing, which was for the Acoustical
and Linguistics group, developing applications for Windows using Microsoft C, C++, the 
Windows SDK, and using Smalltalk as a prototype tool. The object-based applications we 
created utilized new speech technology as a solution to business needs including a speech 
driven robotic work order system.
           </bbd:description>
        </rdf:Description>
     </rdf:_1>
  </rdf:Description>

  <rdf:Description rdf:about="skills">
    <rdf:_1>
      <rdf:Description rdf:about="java">
       <bbd:level>Expert</bbd:level>
       <bbd:years>6</bbd:years>
       <bbd:lastused>now</bbd:lastused>
      </rdf:Description>
    </rdf:_1>
    <rdf:_2>
      <rdf:Description rdf:about="C++">
       <bbd:level>Expert</bbd:level>
       <bbd:years>8</bbd:years>
       <bbd:lastused>2 years ago</bbd:lastused>
      </rdf:Description>
    </rdf:_2>
  </rdf:Description>

</rdf:RDF>

Tip

The PHP XML classes may have been updated to reflect the most recent RDF specifications by the time this book is published.

To demonstrate both the persistence capability and the query functionality of the PHP XML classes, Example 10-9 shows a complete PHP page that opens a connection to the database, loads in a document, queries the data, and then removes the document from persistent storage.

Example 10-9. Application to read in resume RDF/XML document and run query against it
<?
mysql_connect("localhost","username","password");
mysql_select_db("databasename");
?>
<html>
<head>
  <title>RDQL PHP Example</title>
</head>
<body>
<?php
include_once("C:class_rdql_dbclass_rdql_db.php");

# read in, store document
$rdqldb = new RDQL_db(  );
$rdqldb->set_warning_mode(true);
$rdqldb->store_rdf_document("http://weblog.burningbird.net/resume.rdf","resume");
# build and execute query
$query='SELECT ?b
FROM <resume>
WHERE (?a, <bbd:title>, ?b)
USING bbd for <http://www.burningbird.net/resume_schema#>';

#parse and print results
$rows = RDQL_query_db::rdql_query_db($query);
if (!empty($rows)) {
   foreach($rows as $row) {
      foreach($row as $key=>$val) {
         print("$val<p>");
      }
   }
}
else {
   print("No data found");
}

# data dump and delete document from db
$data = $rdqldb->get_rdf_document("resume");
print("<h3>General dump of the data</h3>");
print($data);

$rdqldb->remove_rdf_document("resume");
?>
</div>
</body>
</html>

This example is running in a Windows environment, and the path to the PHP class is set accordingly. The method get_rdf_document returns the RDF/XML of the document contained within the database. To print out the elements as well as the data, modify the string before printing:

$data=str_replace("<","&lt;",$data);
$data=str_replace(">","&gt;",$data);
print ($data);

As the example demonstrates, parsing and querying an RDF/XML document with the PHP XML classes is quite simple, one of the advantages of a consistent metadata storage and query language.

The code for Query-O-Matic Lite is even simpler. The first page with the HTML form has just one field, querystr, a textarea input field. When the form is submitted, the second page accesses this string, strips out any slashes, and then passes the string directly to the PHP class to process the query, as is shown in Example 10-10. In this example, the RDQL class is used and the document is opened directly via URL, rather than being persisted to a database first. In addition, unlike Query-O-Matic, Lite allows multiple variables in the select clause—each is printed out with spaces in between, and each row is printed on a separate line.

Example 10-10. Code for PHP RDF/XML Query-O-Matic Lite
<html>
<head>
  <title>RDFQL Query-O-Matic Light</title>
</head>
<body>
<?php

include_once("class_rdql.php");
$querystr=stripslashes($_GET['querystr']);
$rows = RDQL_query_document::rdql_query_url($querystr);
if (empty($rows)) die("No data found for your query");

foreach($rows as $row) {
      foreach($row as $key=>$val) {
        print("$val ");
      }
  print ("<br /><br />");
  }
?>
</body>
</html>

Even accounting for the HTML in the example, Query-O-Matic Lite is one of the smallest PHP applications I’ve created. However, as long as the underlying RDF/XML parser (class_rdf_parser) can parse the RDF/XML, you can run queries against the data.

Figure 10-3 shows the first page of Query-O-Matic Lite, with an RDQL query typed into the query input text box.

Entering an RDQL query into the Query-O-Matic
Figure 10-3. Entering an RDQL query into the Query-O-Matic

The query, shown in Example 10-11, accesses all degrees and disciplines within the document and prints them out.

Example 10-11. RDQL query accessing disciplines and degrees from resume RDF/XML document
SELECT ?degree, ?discipline
FROM <http://weblog.burningbird.net/resume.rdf>
WHERE (?a, <bbd:discipline>, ?discipline),
      (?a, <bbd:degree>, ?degree)
USING bbd for <http://burningbird.net/resume/elements/1.0/>

The results of running this query are:

AA Liberal Arts 
BA Psychology 
BS Computer Science

The PHP XML classes also support conditional and Boolean operators for filtering data once a subset has been found with the triple patterns. It’s just that the set of operators differs from those for Jena, as there has been no standardization of RDQL across implementations...yet. In addition, you can list more than one document in the from/source clause, and the data from both is then available for the query.

I loaded several RDF/RSS files (for more on RSS, see Chapter 13) from my web sites and then created a query that searched for all entries after a certain time (the start of 2003) and printed out the date/timestamp, title, and link to the article. Example 10-12 contains the RDQL for this query.

Example 10-12. Complex RDQL query
SELECT ?date, ?title, ?link
FROM <http://weblog.burningbird.net/index.rdf>
     <http://articles.burningbird.net/index.rdf>
     <http://rdf.burningbird.net/index.rdf>
WHERE (?a, <rdf:type>, <rss:item>),
      (?a, <rss:title>, ?title),
      (?a, <rss:link>, ?link),
      (?a, <dc:date>, ?date)
AND ?date > '2002-12-31'
USING rss for <http://purl.org/rss/1.0/>,
      dc for <http://purl.org/dc/elements/1.1/>

The data from all RDF/XML files was joined, the query made and filtered, and the resulting output met my expectations. Not only that, but the process was quite quick, as well as incredibly easy—a very effective demonstration of the power of RDF, RDF/XML, and RDQL.

Sesame

Sesame is, to quote the web site where it’s supported, “...an Open Source RDF Schema-Based Repository and Querying Facility.” It’s a Java JSP/Servlet application that I downloaded and installed on my Windows box, running it with a standalone Tomcat server (Version 4.1.18).

Tip

The Sesame web site, including source for the product and documentation, is at http://sesame.aidministrator.nl/.

Once I worked through an installation problem having to do with an extraneous angle bracket in the web.xml file definition for an Oracle database installation (something the creators of Sesame have said will be fixed), getting the application to run was a piece of cake—just start Tomcat.

I installed Sesame with support for MySQL. Once I started it (see instructions), the first thing I did was load in the monsters1.rdf test document, accessed through the URL online. The document loaded fairly quickly, though the tool didn’t provide feedback that it was finished loading.

After loading, I explored the database entries by accessing the Explore menu option (at the top of the page) and then specifying http://burningbird.net/articles/monsters1.htm as the URI to start the exploration with (the top-level resource for the test document). The page that opened is shown in Figure 10-4. Quite a nice layout, with each predicate/object defined as a hypertext link that takes you to more information about the object. Like BrownSauce, covered in Chapter 7, Sesame provides a nice RDF/XML browser.

RDF/XML test document, explored in Sesame
Figure 10-4. RDF/XML test document, explored in Sesame

Two other options at the top of the Sesame page allow you to query the data using RDQL (the same RDQL explored in this chapter) or using Sesame’s RQL (RDF Query Language). I accessed the RDQL page first and tried the RDQL query defined earlier in Example 10-7:

SELECT ?date
WHERE 
(?resource, <rdf:type>, <pstcn:Movement>),
(?resource, <pstcn:movementType>, ?value),
(?resource, <dc:date>, ?date)
AND (?value eq "Add")
USING pstcn FOR       <http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
      dc for <http://purl.org/dc/elements/1.1/>

Note that this query is looking for a date (dc:date) for the resource movement where the movement was equivalent to the resource being added ("Add"). Figure 10-5 shows the result of running this query, which was evaluated in an amazingly short amount of time—seemingly instantaneous.

Running RDQL query and viewing the result
Figure 10-5. Running RDQL query and viewing the result

RQL is similar in concept to RDQL, though not surprisingly it has a different syntax, as well as different features and functionality. For instance, using the online repository querying capability, you can easily find all RDF classes within the repository just by typing Class as the query (by itself with no other characters). For the test document, the result is:

http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://www.w3.org/2000/01/rdf-schema#Resource
http://www.w3.org/2000/01/rdf-schema#Literal
http://www.w3.org/2000/01/rdf-schema#Class
http://burningbird.net/postcon/elements/1.0/Resource
http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq
http://burningbird.net/postcon/elements/1.0/Movement

The PostCon classes of Movement and Resource are found, as are the RDF class Seq and the RDFS classes of Property, Resource, Literal, and Class. A variation of this query is Property, to get a listing of all properties in the repository.

To get more selective in your information querying, to find the source and target for a specific property, you would provide the full URI of the property. For instance, to find the source and target for the predicate movementType, I typed in the following:

http://burningbird.net/postcon/elements/1.0/movementType

This returned the following:

http://www.yasd.com/dynaearth/monsters1.htm  "Add"
http://www.dynamicearth.com/articles/monsters1.htm "Move"
http:/burningbird.net/articles/monsters1.htm "Move"

As with RDQL, you can build complex queries using joins and conditional operations. It’s here that there’s a great deal of similarity between RDQL and RQL. In the following, the source and target for the movementType property is queried using a more formalized SQL-like query like RDQL uses:

select X, Y
from {X} http://burningbird.net/postcon/elements/1.0/movementType {Y}

Conditional operators are provided in a where clause following the select from clause, as the following demonstrates finding a specific source whose movementType is equal to "Add":

select X
from {X} http://burningbird.net/postcon/elements/1.0/movementType {Y}
where Y = "Add"

To join queries, use a period between the query results. In the following RQL query, all objects that have a property of http://burningbird.net/postcon/elements/1.0/related are queried and then joined with another query that finds the titles of the related resources:

select *
from http://burningbird.net/postcon/elements/1.0/related {X}. http://purl.org/dc/
elements/1.1/title {Y}

The result from this query is:

http://burningbird.net/articles/monsters2.htm  "Cryptozooloy"
http://burningbird.net/articles/monsters3.htm "A Tale of Two Monsters: Architeuthis 
Dux (Giant Squid)"
http://burningbird.net/articles/monsters4.htm "Nessie, the Loch Ness Monster "

You can see a great deal of similarity between the two query languages, and I like both equally well, though I’ll admit to a slight preference for the simplicity of RQL.

Of course, being able to query a repository via a predefined interface isn’t going to help you build an application. Sesame comes with a Java API for both server and client functions, including being able to run RDQL and RQL queries against the repository. I won’t cover either in this chapter, as both are quite nicely documented at the Sesame web site, and documentation is included with the downloaded property.

One additional feature of Sesame is the repositories support for different protocols for querying the data, using SOAP and the Java RMI in addition to invoking services using HTTP. Again, these are very well documented, including examples, at the Sesame site and in the downloaded product. In addition, as was mentioned earlier in the chapter, you can also use the Sesame repository as the persistent datastore with the Jena Java API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.34.31