RDF as a model for metadata and RDF/XML as a way of serializing the model are interesting, but the power of the specifications lies in our ability to access the data easily, using techniques we’re familiar with from other data models, such as the relational data model discussed in Chapter 6.
It is only natural that techniques used for one data model should be adapted for use with another; so the method for accessing the relational data model, Structured Query Language (SQL), is used in a similar manner with RDF/XML through language techniques such as SquishQL, RDQL, RQL, and others.
Many of the query languages and schemas mentioned in this chapter are also covered in an online document at http://www.w3.org/2001/11/13-RDF-Query-Rules. In addition, if your interest is more inclined to RDF as data (or to the more logical side of RDF), check out the www-rdf-rules discussion list at http://lists.w3.org/Archives/Public/www-rdf-rules/.
RDF and the relational data model are both metadata models, so it’s natural to want to see how the one can work with the other. Stanford took a look at different designs of tables for storing RDF data in an online paper located at http://www-db.stanford.edu/~melnik/rdf/db.html. With some differences based on data types and the ability to store multiple models, most of the schemas demonstrated were basically the same—store the model as triples, with or without support for additional information such as namespace or model identifier.
An up dated document comparing RDBMS and Semantic Web data is located at http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/.
If you look at implementations that store RDF within relational databases, these simple overlay schemas are used, for the most part, by all of them. For instance, Jena gives you a couple of different options in database storage; the first is whether multiple models are supported, and the second is whether a hash is used to generate the identifiers for the resources. However, the basic structure of the database is the same—a table for storing statements, with secondary tables storing literals (which could get quite large), resources, and namespaces.
Siderean Software’s Seamark server (covered in Chapter 15) also uses a basic layout for storing its data, with separate tables for resource and literal and another table pulling together the triples (in addition to specific information about accessing the model). However, other applications, such as Plugged In Software’s Tucana Knowledge Store, use a data storage schema that is built from the ground up based on RDF, and make no use of relational data stores at all.
Anoth er online white paper that discusses the relational data model and RDF directly is “Relational Databases and the Semantic Web” at http://www.w3.org/DesignIssues/RDB-RDF.html.
One of the earliest persistent data stores for RDF was R.V. Guha’s rdfDB, a database built from the ground up to store RDF data. This database, written in C and primarily tested within a Linux environment, uses a specialized language derived from SQL, a language he called “...a high level SQLish query language,” to manipulate and query RDF data within the database.
You can download a copy of rdfDB at http://guha.com/rdfdb/. Note that there has been little activity with this database in the last few years; I’m including coverage of it here primarily for historical perspective.
In Guha’s language, you can create a database, insert or delete rows from it, and query it. A row in his language would be an RDF triple, in the format of arc-source-target, somewhat different from N-Triples and other languages that portray an RDF triple as source-arc-target. However, the principles are the same.
For instance, to insert a row, use the following syntax (taken from Guha’s sample session online):
insert into test1 (type DanB Person), (name DanB 'Dan Brickley') </>
If the result is successful, the database returns 0; otherwise, a negative value representing the type of error that occurred with the statement is returned.
The data is queried by forming a select statement that provides a variable or variables for resulting data, a from clause giving the database name, and a where clause made up of triples in the format of arc-source-target, with placeholders in the position of unknown values. Again from the sample he provides at his web site:
select ?x from test1 where (worksFor ?x W3C) (name ?x ?y) </>
The results are returned on separate lines, variables mapped to values:
?x = DanC ?y = 'Dan Connolly' ?x = DanB ?y = 'Dan Brickley'
Though Guha’s rdfDB was the precursor to much of the effort in querying RDF, he hasn’t worked on the database recently. However, others took up the effort he pioneered and have since worked to enhance and improve on it. Among these is the Inkling database and SquishQL, an open source effort that included contributions from Leigh Dobbs, Libby Miller, and Dan Brickley.
Unlike rdfDB, written in C in a Linux environment, the Inkling database was written in Java, originally on Linux and Solaris and most recently hosted and tested on Mac OS X, using several Java JDBC classes. Though I’ve tried it only on the Mac OS X environment myself, it should work in other environments that have Java installed. An additional requirement for Inkling is an installation of PostgreSQL, as it uses this database for persistent storage (unlike rdfDB, which manages its own storage).
You can view documentation and test the Inkling database online at http://swordfish.rdfweb.org/rdfquery/. You can also download source code for Inkling at this site. Note that Inkling uses PostgreSQL for its persistent data store. If you don’t want to install Inkling to your own system, you can also use the online test application, running it against your own persisted RDF/XML documents available on the Web.
Once you’ve downloaded the Inkling installation file, you’ll first need to make sure that you have a database called test created, and that you’ve run the SQL commands contained in the inklingsqlschema.psql file. You’ll also need to set JAVA_HOME. In the Mac OS X environment, JAVA_HOME is set to /Library/Java/Home if you’re using the Java installations that are designed specifically for Mac OS X.
The data structure loaded into the PostgreSQL database is relatively simple—one table containing pointers (hashed values) to the actual values in a second table. A flag specifies if the value is a resource or an actual object. If I have anything to disagree with about this design, it’s the combination of resources and objects in one table. Resource URIs are typically Unicode character strings most likely not more than a few hundred characters or so in length. Objects (literals), though, can be large. My test file used in many of the other examples in this book (http://burningbird.net/articles/monsters1.rdf ) has objects that can be several thousand characters in length. Normally, a better design would have been to separate out the known resources into a separate table or even two tables—one for predicates, one for subjects. However, that’s a personal preference.
You can access several demonstration applications installed with Inkling or the online application. You can also use a set of Java classes that support the application directly. Of particular interest in these is a JDBC driver created specifically for Inkling-formatted data, allowing you to query data using a SquishQL-formatted query whether the data is in PostgreSQL database. However, we’re more interested at this point in the queries, which we’ll focus on in the rest of this section.
The example file used throughout this chapter is from Example 6-6 — monsters1.rdf.
The SquishQL supported in Inkling has strong ties to SQL. A simple query is similar to the following:
SELECT ?subject FROM http://burningbird.net/articles/monsters1.rdf WHERE (dc::subject ?x ?subject) USING dc FOR http://purl.org/dc/elements/1.1/
In this query, triples form a where clause, leading with the
predicate, followed by subject and then by object. If the query uses a
variable as placeholder, all values in that field are returned. For this
example, all dc:subject
predicates
are returned regardless of specific subject or object value.
The query is being made against a file rather than the default
database (and can be accessed remotely via a URL), which is noted in the
FROM
clause. The SELECT
clause
lists the value or values returned, and the USING
clause gives a mapping between the
predicate URI and the abbreviation for the URI. It’s important to note
that the using clause isn’t a namespace prefix, but a way of providing
abbreviations for longer URIs. This could mean a specific namespace but
isn’t limited only to namespaces formally identified within the RDF/XML
document.
The variables begin with a question mark and consist of characters, with no spaces. Figure 10-1 shows both this query and the output format as given in the Inkling online query application.
After submitting the form, a second page opens up displaying the results:
The subject is Loch Ness Monster The subject is giant squid The subject is legends The subject is Architeuthis Dux The subject is Nessie
You can also make more complex queries. For instance, to find all
uses of pstcn:reason
associated with
movements, rather than with related resources, you can join query
triples to return specific predicates for given resources that are
themselves identified by other predicates; in this case, a predicate of
rdf:type
of http://burningbird.net/postcon/elements/1.0/Movement
,
as shown in Example
10-1.
SELECT ?resource ?value FROM http://burningbird.net/articles/monsters1.rdf WHERE (rdf::type ?resource "http://burningbird.net/postcon/elements/1.0/Movement") (pstcn::reason ?resource ?value) USING pstcn FOR http://burningbird.net/postcon/elements/1.0/ rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
In this example, the first triple looks for all resources with a
given rdf:type
of http://burningbird.net/postcon/elements/1.0/Movement
.
These are then passed into the second triple in the subject field,
fine-tuning the reasons returned to those associated with movement
resources. In the example, predicates from two namespaces are used, as
shown in the using clause. In addition, two values are returned in the
select clause and printed out:
The reason for the movement to http://www.dynamicearth.com/articles/monsters1.htm is Moved to separate dynamicearth.com domain The reason for the movement to http:/burningbird.net/articles/monsters1.htm is Collapsed into Burningbird The reason for the movement to http://www.yasd.com/dynaearth/monsters1.htm is New Article
This combining of triple patterns is known as following one specific path within an RDF model, of node-arc-node-arc-node and so on. You can add additional triple patterns to travel further down the path until you reach the data you’re after, no matter how deeply nested within the model. The key is to use a variable assigned data in one triple pattern—such as a subject or object value—as one of the constraints in the next triple pattern and so on.
In addition to filtering based on triple pattern matching, you can
also use more traditional query constraints such as the less-than
(<
) and greater-than (>
) operators and equality (=
and ~
).
All of the comparison operators work with integers except for the string
equality operator (~
).
In Example 10-2, the string equality operator is used to return a resource from a movement on a specific date.
SELECT ?resource FROM http://burningbird.net/articles/monsters1.rdf WHERE (rdf::type ?resource "http://burningbird.net/postcon/elements/1.0/Movement") (dc::date ?resource ?date) AND ?date ~ "1999-10-31:T00:00:00-05:00" USING pstcn FOR http://burningbird.net/postcon/elements/1.0/ rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns# dc FOR http://purl.org/dc/elements/1.1/
The example just shown is a variation of about the most complex query you’ll see with RDF, regardless of specific query language. Variations of the queries just add additional constraints, namespaces, sources (such as multiple documents), and so on. But the basic structure given in the following remains the same:
SELECT variables FROM source WHERE (triple clause) USING namespace mapping
The type of query language demonstrated, beginning with rdfDB and continuing with SquishQL, is the one that’s formed the basis of one of the more popular RDF/XML query languages, RDQL, demonstrated in the next section.
The RDQL language is based on the earlier work of Guha’s RDFDB QL and SquishQL, with some relatively minor differences. Its popularity is ensured because of its use within Jena, probably the most widely used RDF API.
RDQL supports the different clauses of select, from, where, and using (with some exceptions) as SquishQL. Additionally, RDQL can change based on the implementation and whether you’re using a Java API such as Jena, a PHP class such as the PHP XML classes, or a Perl module such as RDFStore. However, though the syntax varies within the clauses, the concepts remain the same.
Variables are in the format of a question mark, followed by other characters, just as in SquishQL:
?<identifier>
However, one difference between SquishQL and RDQL occurs in the select clause, which requires commas rather than spaces to separate all variables.
The from, or source, clause, can be omitted with RDQL depending on the implementation. For instance, in Jena, the source of the RDF/XML can be specified and loaded separately through a separate class method or can be given directly in the query. However, in the PHP RDF/XML classes, the from clause must be provided within the query. The same applies to RDFStore, which also requires that the URL be surrounded by angle brackets.
The where clause (or triple pattern clause) differs in that the pattern follows the more traditional subject-predicate-object ordering, and URIs are differentiated from literals by being surrounded by angle brackets. However, the way that triple patterns are combined to form more complex queries is the same in RDQL and SquishQL.
RDQL has greater sophistication in incorporating comparison semantics with the triple pattern within the constrain clause. The use of AND is the same, but other operators — such as the OR operator (|||), bitwise operators (& and |), and negation (!) — are supported.
Within Jena, there is no using clause because the namespaces for the resources are included with the resource rather than being listed as a separate namespace. However, the PHP XML classes support using, as does RDFStore.
In addition to the rich set of Java classes that allow access to individual triples as well as the ability to build complex RDF/XML documents (as described in Chapter 8) Jena also provides specialized classes for use with RDQL:
The Query class manages the actual query, enabling the building of a query through an API or passed as a string.
Query engine interface.
The actual execution of the query (the intelligence behind the query process).
The iterator that manages the results.
Mapping from variables to values.
In addition to these standard classes, newer implementations of
Jena also support some newer classes, such as a QueryEngineSesame
class, which works against the Sesame RDF repository (discussed at the
end of the chapter).
The use of the classes is very straightforward. Use Query
to build or parse the query, which is then passed to
QueryEngine
for
processing. The results are returned to the QueryExecution
class, which provides methods to access the results, which are
assigned to QueryResults
. To
access individual items in the results, the data is bound to program
variables using ResultsBinding
.
To demonstrate how Jena works with RDQL, I created a dynamic query application, which I call the Query-O-Matic, building it in Java as a Tomcat JSP application.
The Query-O-Matic is a two-page application, with the first HTML page containing a form and the second JSP page processing the form contents. It’s built using Jena 1.6, and managed with Tomcat. The source code is included as part of the example code for the book.
The Query-O-Matic does require that you have knowledge of Tomcat and JSP-based applications. If you don’t, you can still work with the code, but you’ll need to provide a different interface for it. You can get more details about Jena’s RDQL support in the RDQL tutorial at http://www.hpl.hp.com/semweb/doc/tutorial/RDQL/index.html.
To create the application, the Jena .jar files must be copied to the common library or to the application-specific WEB-INF lib directory. I copied them to the common library location because I use Jena for several applications.
The first page is nothing special, an HTML form with three fields:
The first field is a text input field to hold the URL of the RDF/XML document.
The second field is a textarea to hold the actual query.
The third field is another text input file to hold the variable that’s printed out.
Figure 10-2 shows the page containing the form, as well as links to sample RDF/XML documents.
In the JSP page, the form values are pulled from the HTTP
request. The URL is used to load the document; once it is loaded,
the query is run against the document using the Jena QueryEngine
class. To iterate through the
results, another class, QueryResults
, is created, and each record
returned from the query is then bound to a specific object, in order
to access a specific value. The result value that’s passed from the
form is polled from the object and the value is printed out, as
shown in Example 10-3.
Once all values are processed, the result set is closed.
<html> <%@ page import="com.hp.hpl.mesa.rdf.jena.mem.*, java.io.File, java.util.*, com.hp.hpl.mesa.rdf.jena.model.*, com.hp.hpl.mesa.rdf.jena.common.*, com.hp.hpl.jena.util.*, com.hp.hpl.jena.rdf.query.*, com.hp.hpl.jena.rdf.query.parser.*" %> <body> <% ModelMem model; try { model = new ModelMem( ); String sUri = request.getParameter("uri"); String sQuery = request.getParameter("query"); String sResult = request.getParameter("result"); model.read(sUri); // query string Query query = new Query(sQuery); query.setSource(model); QueryExecution qe = new QueryEngine(query) ; QueryResults results = qe.exec( ); out.print("<h1>test</h1>"); for ( Iterator iter2 = results ; iter2.hasNext( ) ; ) { ResultBinding env = (ResultBinding)iter2.next( ) ; Object obj = env.get(sResult); out.print(obj.toString( )); out.print("<br>"); } // close results results.close( ) ; } catch (Exception e) { out.print(e.toString( )); } %> <br> </font> </body> </html>
Once the two pages and supporting Jena .jar files are installed into Tomcat, we’re ready to try out some RDQL in the Query-O-Matic.
The simplest test of the Query-O-Matic is to run an RDQL
variation of the first query made with Inkling/SquishQL, which is to
find all the dc:subject
predicates in the RDF/XML document and print out the associated
object values. The contents of the form are given in Example 10-4.
uri: http://burningbird.net/articles/monsters1.rdf query: SELECT ?subject WHERE (?x, <dc:subject>, ?subject) USING dc FOR <http://purl.org/dc/elements/1.1/> result: subject
Comparing this with the SquishQL example shows that both are basically the same with minor syntactic differences. When the form is submitted and the query processed, the results returned are exactly the same, too.
Another slightly more complicated query is shown in Example 10-5, which demonstrates traversing two arcs in order to find a specific value.
SELECT ?value WHERE (?resource, <rdf:type>, <pstcn:Movement>), (?resource, <pstcn:reason>, ?value) USING pstcn FOR<http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Notice that object values that are resources are treated the same as the subject and predicate values, with angle brackets around the URI (or the QName). The only type of value that doesn’t have angle brackets is literals.
A slightly more complicated query more fully demonstrates the filtering capability of the triple pattern. To better understand how this query works, take a look at the N-Triples of the statements of the subgraph from the monsters1.rdf example:
<http://burningbird.net/articles/monsters1.htm> <http://www.w3.org/1999/02/22-rdf- syntax-ns#type> <http://burningbird.net/postcon/elements/1.0/Resource> . <http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/ elements/1.0/presentation> _:jARP10030 . _:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10032 . _:jARP10032 <http://burningbird.net/postcon/elements/1.0/type> "logo" . _:jARP10032 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird. net/mm/dynamicearth.jpg" . _:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10031 . _:jARP10031 <http://burningbird.net/postcon/elements/1.0/type> "stylesheet" . _:jARP10031 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird. net/de.css" .
These are the statements we’ll be querying with the code shown
in Example 10-6. Within
the query, the pstcn:presentation
arc is followed from the main resource (monsters1.htm
) to get the object/resource
for it (a blank node). Then, the pstcn:requires
predicate arc is followed
to get the two required presentation bnodes. However, we’re
interested only in the one whose pstcn:type
is "stylesheet"
. Once we have that, then
we’ll access the value of the stylesheet. The path I just
highlighted in the text is also highlighted in the example.
SELECT ?value WHERE (?x, <pstcn:presentation>, ?resource), (?resource, <pstcn:requires>, ?resource2), (?resource2, <pstcn:type>, "stylesheet"), (?resource2, <rdf:value>, ?value) USING pstcn FOR <http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
The result from running this query is:
http://burningbird.net/de.css
Exactly what we wanted to get.
I used a triple pattern to find the specific required presentation resource, rather than a conditional filter, because I wasn’t going to be querying among the end values—I’m actually modifying the query within the path to the end statement. If I wanted to find specific values using a conditional filter, I would list triple patterns up until I returned all of the statements of interest and then use the filter on these statements to find specific values.
A demonstration of this is shown in Example 10-7, where a date is
returned for a movement with movement type of "Add"
. Notice that equality is denoted by
the eq
operator rather than using
nonalphabetic characters such as ==
, common in several programming
languages.
SELECT ?date WHERE (?resource, <rdf:type>, <pstcn:Movement>), (?resource, <pstcn:movementType>, ?value), (?resource, <dc:date>, ?date) AND (?value eq "Add") USING pstcn FOR <http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>, dc for <http://purl.org/dc/elements/1.1/>
Regardless of the complexity of the query, the Query-O-Matic should be able to process the results. Best of all, you can then take the query and add it to your own code and know that it’s been pretested.
However, if you’re not a big fan of Java, then you may be interested in the PHP version of Query-O-Matic, Query-O-Matic Lite.
If you’ve worked with PHP and with XML, then you’re familiar with the PHP XML classes. These classes provide functionality to process virtually all popular uses of XML, including RDF/XML. The two packages of interest in this chapter are RDQL and RDQL_DB.
The PHP XML cla ss main web page is at http://phpxmlclasses.sourceforge.net/. This section assumes you are familiar with working with PHP.
As you can imagine from the package names, RDQL provides RDQL query capability within the PHP environment, and RDQL_DB provides persistent support for it. They’re both so complete that the PHP version of Query-O-Matic (Lite) took less than 10 lines of code, hence the Lite designation. But before we look at that, let’s take a close look at the classes themselves.
There are four classes within the RDQL package, but the one of
interest to us is RDQL_query_document
.
This class has one method, rdql_query_url
,
which takes as a string a contained query string and returns an array
of associative arrays with the results of the query. The RDQL_DB
package provides two classes of particular importance to this chapter:
RDQL_db,
which
controls all database actions, and RDQL_query_db
, which
acts the same as RDQL_query_document
, taking a string and
returning the results of a query as an array of results. RDQL_DB makes
use of RDQL for query parsing and other shared functionality.
To use RDQL_DB, you’ll need to preload the database structure required by the package. This is found in a file called rdql_db.sql in the installation. At this time, only MySQL is supported, and the file is loaded at the command line:
mysql databasename < rdql_db.sql
You must, of course, have the ability to modify the database in order to create tables in it. Follow the MySQL documentation if you have problems loading the RDQL tables.
The RDQL table structure is quite simple. Two tables are
created: rdf_data
contains columns
for each member of an RDF triple as well as information about each,
and rdf_documents
keeps track of
the different RDF/XML documents that are loaded into the database.
Unlike the PHP classes discussed in Chapter 9, the PHP RDQL and RDQL_DB
packages provide functionality to parse, load, and persist existing
RDF/XML documents and to use RDQL to query them, but neither provides
functionality to modify or create an RDF/XML document.
At the time of this writing, the PHP XML classes had not been updated to include the new RDF/XML constructs. Because of this, the example RDF/XML document used for most of the book, monsters1.rdf, can’t be parsed cleanly. Instead, another RDF/XML document was used. This document is reproduced in Example 10-8 so that you can follow the demonstration more easily.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bbd="http://burningbird.net/resume/elements/1.0/" xml:base="http://burningbird.net/shelley_powers/resume/" > <rdf:Description rdf:about="http://burningbird.net/shelley_powers/"> <bbd:bio rdf:resource="bio"/> <bbd:job rdf:resource="job" /> <bbd:education rdf:resource="education" /> <bbd:experience rdf:resource="experience" /> <bbd:skills rdf:resource="skills" /> <bbd:references rdf:resource="references" /> </rdf:Description> <rdf:Description rdf:about="bio"> <bbd:firstname>Shelley</bbd:firstname> <bbd:lastname>Powers</bbd:lastname> <bbd:city>St. Louis</bbd:city> <bbd:state>Missouri</bbd:state> <bbd:country>US</bbd:country> <bbd:homephone> - </bbd:homephone> <bbd:mobile> - </bbd:mobile> <bbd:workphone> - </bbd:workphone> <bbd:email>[email protected]</bbd:email> </rdf:Description> <rdf:Description rdf:about="job"> <bbd:position>Software Engineer</bbd:position> <bbd:position>Technical Architect</bbd:position> <bbd:experience>16+ years</bbd:experience> <bbd:permorcontract>Contract</bbd:permorcontract> <bbd:start>2002-09-29</bbd:start> <bbd:relocate>No</bbd:relocate> <bbd:travel>yes</bbd:travel> <bbd:location>St. Louis, Missouri</bbd:location> <bbd:status>full</bbd:status> <bbd:rateusdollars>100</bbd:rateusdollars> <bbd:unit>hour</bbd:unit> <bbd:worklocation>both</bbd:worklocation> <bbd:idealjob>I'm primarily interested in contract positions with a fairly aggressive schedule; I like to be in an energetic environment. My preferred work is technology architecture, but I'm also a hands-on senior software developer. </bbd:idealjob> </rdf:Description> <rdf:Description rdf:about="education"> <rdf:_1> <rdf:Description rdf:about="degree1"> <bbd:degree>AA</bbd:degree> <bbd:discipline>Liberal Arts</bbd:discipline> <bbd:date>1981-06-01</bbd:date> <bbd:gpa>3.98</bbd:gpa> <bbd:honors>High Honors</bbd:honors> <bbd:college>Yakima Valley Community College</bbd:college> <bbd:location>Yakima, Washington</bbd:location> </rdf:Description> </rdf:_1> <rdf:_2> <rdf:Description rdf:about="degree2"> <bbd:degree>BA</bbd:degree> <bbd:discipline>Psychology</bbd:discipline> <bbd:date>1986-06-01</bbd:date> <bbd:gpa>3.65</bbd:gpa> <bbd:honors>Magna cum laude</bbd:honors> <bbd:honors>Dean's Scholar</bbd:honors> <bbd:college>Central Washington University</bbd:college> <bbd:location>Ellensburg, Washington</bbd:location> </rdf:Description> </rdf:_2> <rdf:_3> <rdf:Description rdf:about="degree3"> <bbd:degree>BS</bbd:degree> <bbd:discipline>Computer Science</bbd:discipline> <bbd:date>1987-06-01</bbd:date> <bbd:gpa>3.65</bbd:gpa> <bbd:college>Central Washington University</bbd:college> <bbd:location>Ellensburg, Washington</bbd:location> </rdf:Description> </rdf:_3> </rdf:Description> <rdf:Description rdf:about="experience"> <rdf:_1> <rdf:Description rdf:about="job1"> <bbd:company>Boeing</bbd:company> <bbd:title>Data Architect</bbd:title> <bbd:title>Information Repository Modeler</bbd:title> <bbd:title>Software Engineer</bbd:title> <bbd:title>Database Architect</bbd:title> <bbd:start>1987</bbd:start> <bbd:end>1992</bbd:end> <bbd:description> At Boeing I worked as a developer for the Peace Shield Project (FORTRAN/Ingres on VAX/ VMS). Peace Shield is Saudi Arabia's air defense system. At the end of the project, I moved into a position of Oracle DBA and provided support for various organizations. I worked with Oracle versions 5.0 and 6.0, and with SQL Forms, Pro*C, and OCI. I was also interim information modeler for Boeing Commercial's Repository, providing data modeling and design for this effort. From the data group, I moved into my last position at Boeing, which was for the Acoustical and Linguistics group, developing applications for Windows using Microsoft C, C++, the Windows SDK, and using Smalltalk as a prototype tool. The object-based applications we created utilized new speech technology as a solution to business needs including a speech driven robotic work order system. </bbd:description> </rdf:Description> </rdf:_1> </rdf:Description> <rdf:Description rdf:about="skills"> <rdf:_1> <rdf:Description rdf:about="java"> <bbd:level>Expert</bbd:level> <bbd:years>6</bbd:years> <bbd:lastused>now</bbd:lastused> </rdf:Description> </rdf:_1> <rdf:_2> <rdf:Description rdf:about="C++"> <bbd:level>Expert</bbd:level> <bbd:years>8</bbd:years> <bbd:lastused>2 years ago</bbd:lastused> </rdf:Description> </rdf:_2> </rdf:Description> </rdf:RDF>
The PHP XML classes may have been updated to reflect the most recent RDF specifications by the time this book is published.
To demonstrate both the persistence capability and the query functionality of the PHP XML classes, Example 10-9 shows a complete PHP page that opens a connection to the database, loads in a document, queries the data, and then removes the document from persistent storage.
<? mysql_connect("localhost","username","password"); mysql_select_db("databasename"); ?> <html> <head> <title>RDQL PHP Example</title> </head> <body> <?php include_once("C:class_rdql_dbclass_rdql_db.php"); # read in, store document $rdqldb = new RDQL_db( ); $rdqldb->set_warning_mode(true); $rdqldb->store_rdf_document("http://weblog.burningbird.net/resume.rdf","resume"); # build and execute query $query='SELECT ?b FROM <resume> WHERE (?a, <bbd:title>, ?b) USING bbd for <http://www.burningbird.net/resume_schema#>'; #parse and print results $rows = RDQL_query_db::rdql_query_db($query); if (!empty($rows)) { foreach($rows as $row) { foreach($row as $key=>$val) { print("$val<p>"); } } } else { print("No data found"); } # data dump and delete document from db $data = $rdqldb->get_rdf_document("resume"); print("<h3>General dump of the data</h3>"); print($data); $rdqldb->remove_rdf_document("resume"); ?> </div> </body> </html>
This example is running in a Windows environment, and the path
to the PHP class is set accordingly. The method get_rdf_document
returns the RDF/XML of the document contained within the database. To
print out the elements as well as the data, modify the string before
printing:
$data=str_replace("<","<",$data); $data=str_replace(">",">",$data); print ($data);
As the example demonstrates, parsing and querying an RDF/XML document with the PHP XML classes is quite simple, one of the advantages of a consistent metadata storage and query language.
The code for Query-O-Matic Lite is even simpler. The first page
with the HTML form has just one field, querystr
, a textarea input field. When the
form is submitted, the second page accesses this string, strips out
any slashes, and then passes the string directly to the PHP class to
process the query, as is shown in Example 10-10. In this example,
the RDQL class is used and the document is opened directly via URL,
rather than being persisted to a database first. In addition, unlike
Query-O-Matic, Lite allows multiple variables in the select
clause—each is printed out with spaces in between, and each row is
printed on a separate line.
<html> <head> <title>RDFQL Query-O-Matic Light</title> </head> <body> <?php include_once("class_rdql.php"); $querystr=stripslashes($_GET['querystr']); $rows = RDQL_query_document::rdql_query_url($querystr); if (empty($rows)) die("No data found for your query"); foreach($rows as $row) { foreach($row as $key=>$val) { print("$val "); } print ("<br /><br />"); } ?> </body> </html>
Even accounting for the HTML in the example, Query-O-Matic Lite
is one of the smallest PHP applications I’ve created. However, as long
as the underlying RDF/XML parser (class_rdf_parser
) can parse the RDF/XML, you
can run queries against the data.
Figure 10-3 shows the first page of Query-O-Matic Lite, with an RDQL query typed into the query input text box.
The query, shown in Example 10-11, accesses all degrees and disciplines within the document and prints them out.
SELECT ?degree, ?discipline FROM <http://weblog.burningbird.net/resume.rdf> WHERE (?a, <bbd:discipline>, ?discipline), (?a, <bbd:degree>, ?degree) USING bbd for <http://burningbird.net/resume/elements/1.0/>
The results of running this query are:
AA Liberal Arts BA Psychology BS Computer Science
The PHP XML classes also support conditional and Boolean operators for filtering data once a subset has been found with the triple patterns. It’s just that the set of operators differs from those for Jena, as there has been no standardization of RDQL across implementations...yet. In addition, you can list more than one document in the from/source clause, and the data from both is then available for the query.
I loaded several RDF/RSS files (for more on RSS, see Chapter 13) from my web sites and then created a query that searched for all entries after a certain time (the start of 2003) and printed out the date/timestamp, title, and link to the article. Example 10-12 contains the RDQL for this query.
SELECT ?date, ?title, ?link FROM <http://weblog.burningbird.net/index.rdf> <http://articles.burningbird.net/index.rdf> <http://rdf.burningbird.net/index.rdf> WHERE (?a, <rdf:type>, <rss:item>), (?a, <rss:title>, ?title), (?a, <rss:link>, ?link), (?a, <dc:date>, ?date) AND ?date > '2002-12-31' USING rss for <http://purl.org/rss/1.0/>, dc for <http://purl.org/dc/elements/1.1/>
The data from all RDF/XML files was joined, the query made and filtered, and the resulting output met my expectations. Not only that, but the process was quite quick, as well as incredibly easy—a very effective demonstration of the power of RDF, RDF/XML, and RDQL.
Sesame is, to quote the web site where it’s supported, “...an Open Source RDF Schema-Based Repository and Querying Facility.” It’s a Java JSP/Servlet application that I downloaded and installed on my Windows box, running it with a standalone Tomcat server (Version 4.1.18).
The Sesame web site, including source for the product and documentation, is at http://sesame.aidministrator.nl/.
Once I worked through an installation problem having to do with an extraneous angle bracket in the web.xml file definition for an Oracle database installation (something the creators of Sesame have said will be fixed), getting the application to run was a piece of cake—just start Tomcat.
I installed Sesame with support for MySQL. Once I started it (see instructions), the first thing I did was load in the monsters1.rdf test document, accessed through the URL online. The document loaded fairly quickly, though the tool didn’t provide feedback that it was finished loading.
After loading, I explored the database entries by accessing the Explore menu option (at the top of the page) and then specifying http://burningbird.net/articles/monsters1.htm as the URI to start the exploration with (the top-level resource for the test document). The page that opened is shown in Figure 10-4. Quite a nice layout, with each predicate/object defined as a hypertext link that takes you to more information about the object. Like BrownSauce, covered in Chapter 7, Sesame provides a nice RDF/XML browser.
Two other options at the top of the Sesame page allow you to query the data using RDQL (the same RDQL explored in this chapter) or using Sesame’s RQL (RDF Query Language). I accessed the RDQL page first and tried the RDQL query defined earlier in Example 10-7:
SELECT ?date WHERE (?resource, <rdf:type>, <pstcn:Movement>), (?resource, <pstcn:movementType>, ?value), (?resource, <dc:date>, ?date) AND (?value eq "Add") USING pstcn FOR <http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>, dc for <http://purl.org/dc/elements/1.1/>
Note that this query is looking for a date (dc:date
) for the resource movement where the
movement was equivalent to the resource being added ("Add"
). Figure 10-5 shows the result of
running this query, which was evaluated in an amazingly short amount of
time—seemingly instantaneous.
RQL is similar in concept to RDQL, though not surprisingly it has
a different syntax, as well as different features and functionality. For
instance, using the online repository querying capability, you can
easily find all RDF classes within the repository just by typing
Class
as the query (by itself with no
other characters). For the test document, the result is:
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property http://www.w3.org/2000/01/rdf-schema#Resource http://www.w3.org/2000/01/rdf-schema#Literal http://www.w3.org/2000/01/rdf-schema#Class http://burningbird.net/postcon/elements/1.0/Resource http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq http://burningbird.net/postcon/elements/1.0/Movement
The PostCon classes of Movement
and Resource
are found, as are the
RDF class Seq
and the RDFS classes of
Property
, Resource
, Literal
, and Class
. A variation of this query is Property
, to get a listing of all properties
in the repository.
To get more selective in your information querying, to find the
source and target for a specific property, you would provide the full
URI of the property. For instance, to find the source and target for the
predicate movementType
, I typed in
the following:
http://burningbird.net/postcon/elements/1.0/movementType
This returned the following:
http://www.yasd.com/dynaearth/monsters1.htm "Add" http://www.dynamicearth.com/articles/monsters1.htm "Move" http:/burningbird.net/articles/monsters1.htm "Move"
As with RDQL, you can build complex queries using joins and
conditional operations. It’s here that there’s a great deal of
similarity between RDQL and RQL. In the following, the source and target
for the movementType
property is
queried using a more formalized SQL-like query like RDQL uses:
select X, Y from {X} http://burningbird.net/postcon/elements/1.0/movementType {Y}
Conditional operators are provided in a where clause following the
select
from
clause, as the following demonstrates
finding a specific source whose movementType
is equal to "Add"
:
select X from {X} http://burningbird.net/postcon/elements/1.0/movementType {Y} where Y = "Add"
To join queries, use a period between the query results. In the
following RQL query, all objects that have a property of http://burningbird.net/postcon/elements/1.0/related
are queried and then joined with another query that finds the titles of
the related resources:
select * from http://burningbird.net/postcon/elements/1.0/related {X}. http://purl.org/dc/ elements/1.1/title {Y}
The result from this query is:
http://burningbird.net/articles/monsters2.htm "Cryptozooloy" http://burningbird.net/articles/monsters3.htm "A Tale of Two Monsters: Architeuthis Dux (Giant Squid)" http://burningbird.net/articles/monsters4.htm "Nessie, the Loch Ness Monster "
You can see a great deal of similarity between the two query languages, and I like both equally well, though I’ll admit to a slight preference for the simplicity of RQL.
Of course, being able to query a repository via a predefined interface isn’t going to help you build an application. Sesame comes with a Java API for both server and client functions, including being able to run RDQL and RQL queries against the repository. I won’t cover either in this chapter, as both are quite nicely documented at the Sesame web site, and documentation is included with the downloaded property.
One additional feature of Sesame is the repositories support for different protocols for querying the data, using SOAP and the Java RMI in addition to invoking services using HTTP. Again, these are very well documented, including examples, at the Sesame site and in the downloaded product. In addition, as was mentioned earlier in the chapter, you can also use the Sesame repository as the persistent datastore with the Jena Java API.
3.138.34.31