This system of defining everything with URIs, and using this to describe the relationships between things, has been formalized in a system known as the Resource Description Framework (RDF). In this section, we’ll look at enough RDF to give you a head start on the rest of the book. For a much deeper insight into RDF, take a look at Practical RDF (O’Reilly).
Because RDF is quite abstract—its ability to be written in different ways notwithstanding—in this chapter, we are going to look at what the RDF developers call the “data model,” which we can call “the really simple version, in pictures.”
As before, within the data model, anything (an object, a person, a document, a concept, a section of a document, etc.) can have a URI. In RDF anything addressable with a URI is called a resource .
Some resources can be used as properties of other resources. For example, the concept of “Author” has a URI of its own (all concepts can), and other resources can have a property of “author.” Such resources are called PropertyTypes .
A property is the combination of a resource, a PropertyType, and a value. For example, “The Author of RSS and Atom is Ben Hammersley.” The value can be a string (“Ben Hammersley” in the previous example), or it can be another resource—for example, “Ben Hammersley (resource) has a home page (PropertyType) at http://www.benhammersley.com (resource).”
RDF’s data model is most easily understood with diagrams, called RDF graphs, that show the relationships between resources, PropertyTypes, and properties. In these diagrams, the RDF world is split into nodes and arcs.
The resources and the values are the nodes, identified by their URIs. The PropertyTypes are the arcs, representing connections between nodes. The arcs themselves are also described by a URI.
Figure 5-1 is an RDF graph that shows the previous
managingEditor
example as three nodes connected by
two arcs—two separate RDF triples. By convention, the subject
is at the blunt end of the arrow, the property (or predicate) is the
arrow itself, and the object is at the pointy end of the arrow.
In Figure 5-1, the subject node on the left,
representing the URI
http://www.example.org/example.rss
, has a
relationship with the object node on the right, representing the URI
[email protected]
, and this relationship is
defined by the URI
http://purl.org/rss/1.0/modules/rss091#managingEditor
.
The subject node also has a relationship with another object node,
representing the URI
http://purl.org/rss/1.0/channel
, and that
relationship is defined by the URI
http://www.w3.org/199/02/22-rdf-syntax-ns#type
.
What makes things interesting with RDF is that, as I’ve said before, a node can be both a subject and an object in a chain of node, arc, node, arc, node, and so on (or, to put it another way, resource, PropertyType, resource, PropertyType, resource, and so on). Consider the graph in Figure 5-2.
In this example, we’ve taken the RDF graph a step further. We’ve created a resource to represent the managing editor (you’ll notice that the managing editor resource itself is anonymous—we haven’t defined it with a URI yet, hence the empty rectangle—this isn’t a problem), but have given it resources of its own, with PropertyType arcs whose URIs represent the managing editor’s full name, home page, and email address.
This allows some definitive statements:
The channel (where the concept of “channel” is identified by the URI
http://purl.org/rss/1.0/
and the channel itself is identified by the URIhttp://www.example.org/example.rss
) has a resource calledmanagingEditor
(which is part of a concept defined by the URIhttp://purl.org/rss/1.0/modules/rss091#
), which in turn has one resource of its own, identified as a “home page” in the context of the URIhttp://example.org/stuff/1.0/
, which is itself identified with the URIhttp://jorge.oreilly.com/
. It also has two properties,fullName
andhttp://example.org/stuff/1.0/
, with the valuesJorge
Grandehoncho
andmailto:[email protected]
, respectively.
Or to put it simply:
This channel has a managing editor whose name is
Jorge
Grandehoncho
, whose home page ishttp://jorge.oreilly.com/
, and whose email address is[email protected]
.
You should bear two things in mind. First, the continuation of the RDF graph doesn’t have to be constrained to one RDF document. The preceding example can be extended by including more RDF data at the network-retrievable version of the resource’s URIs. So, while the RDF data for this book may refer to me solely by author, PropertyType, and a URI, the RDF at that URI can also refer to other things I have written, and those articles can contain RDF data that refers to the subjects of the articles. This distributed nature of RDF allows for vast fields of statements to be made definitively, and every additional set of RDF data increases the power of the whole considerably. RDF data is designed with aggregation in mind.
Second, and this will become key later on, because the PropertyTypes—the possible relationships between nodes—are represented by a URI, anyone can develop a set of elements. RDF vocabularies, therefore, can be developed to describe anything. And, as long as the URI is unique, RDF parsers won’t get confused. Your descriptive powers, therefore, are endless: either an RDF vocabulary exists, or it is simple to make up your own.
Outside the scope of this book, there are also various languages for describing RDF vocabularies, or ontologies. When you add all of these together, you have what is called the Semantic Web.
This system for creating definitive statements from metadata fits perfectly with the aims of RSS. RSS feeds are, at their core, collections of resources with implicit relationships, and RDF is designed to describe these relationships. Also, and most powerfully, RDF makes these relationships explicit in a way that allows them to be used.
For example, the RDF graph can be traveled in any direction. The statement “This document (subject/resource) was written (predicate/PropertyType) by Ben Hammersley (object/resource)” can be read from the other end of the graph: “Ben Hammersley (subject/resource) wrote (predicate/PropertyType) this document (object/resource).”
So, you can query a database of RDF-based documents for “all the documents written by Ben Hammersley.” If more triples are declared within the documents, you can query for “all the documents written by the man with the email address [email protected],” or even “all the documents written by the man with the email address [email protected], and which are on the subject of dates.” To take it even further, you can query for “all the documents written by the man with the email address [email protected], and which are on the subject of dates (in the context of small fruits, but not romantic encounters).” By taking different paths through an RDF graph, you can extract all sorts of data quite easily. You can also, by adding in RDF vocabularies not covered by RSS, do even more complicated searches, such as, “Find me all the articles written by any friend of Ben Hammersley, during any year that Manchester United won the English Premier League,” and searches that are much more complex and interesting.
The ability of RDF to allow complex querying is one definite attraction, but the implications go further than that. Because RDF works just as well distributed as in a database, publishing an RDF version of RSS provides a remarkably useful entry point for the RDF world to access your site. Also, because the RDF vocabularies are easily definable, anyone can invent one. This makes RDF both wide ranging and fast growing, but in a way that doesn’t require a single standards overlord. In the language of RSS 1.0, RDF is extensible.
18.220.65.61