Unlike other W3C specifications, such as HTML, you’re not going to see RDF documents consisting solely of the elements that have been described in Chapter 3 through Chapter 5. Yes, there is a defined syntax for RDF, as reviewed in Chapter 3 and Chapter 4, and there is an RDF Schema, explored in Chapter 5. However, RDF isn’t used to model business-specific resources directly because there are no domain-specific elements within the specification. Instead, RDF creates domain-specific vocabularies that are then used to model the resources, with an added advantage of having access to RDF-specific parsers and automated processes.
What kinds of vocabularies can be created? A better question is: what kinds of business resources can be described using a syntax/schema such as RDF? And the answer is: any business resource. The number of possible vocabularies is limitless, constrained only by each industry’s need for interoperable vocabularies.
In this chapter you’ll have a chance to see how a vocabulary is created and validated against the RDF syntax and schema. Once the elements for the vocabulary are defined, they’ll then be compared against an existing web resource domain vocabulary, the Dublin Core, to look for matches.
First, though, let’s take a closer look at what I mean when I say “RDF Vocabulary.”
RDF is a way of recording information about resources; RDF, as serialized using XML, is a way of recording information about a specific business domain using a set of elements defined within the rules of the RDF data model/graph and the constraints of the RDF syntax, vocabulary, and semantics.
RDF recorded in XML is a very powerful tool—it’s been used to document events within a heterogeneous application environment, to describe publications, to record an environmental thesaurus, and so on. By using XML, you have access to a great number of existing XML applications such as parsers and APIs, even relational and Lightweight Directory Access Protocol (LDAP) data sources that are XML-capable. However, what do you get when you use RDF? Why not use XML directly?
As mentioned in previous chapters, RDF provides the same level of functionality to XML as the relational data model adds to commercial database systems. RDF provides a predefined grammar that can be used to consistently record business domain information in such a way that any business domain can have a vocabulary in RDF that can be processed with a host of RDF-based tools and frameworks.
Consider the environmental thesaurus I just mentioned. This is a joint effort between the California Environmental Resource Evaluation System (CERES) and the National Biological Information Infrastructure (NBII). This partnership was formed to create a common environmental vocabulary and the tools necessary to work with this vocabulary. One of the efforts of this project is to document this vocabulary using RDF.
Within the RDF vocabulary, the project has defined a class called Term that has several properties, such as Source, Category, and Status, attached to it. Instead of using RDF, the project could have recorded this information directly within XML; however, if they did this, they then would have to define the concept of “class” and “property” in order to record relationships such as “Source is a property of Term.” In addition, the project would also have to create code to process the XML in such a way that the Source element is processed as a property of Term rather than an arbitrary related element that happens to be nested within the Term element. Lastly, the group would need to create a schema to support these new objects so that the XML document matches the constraints documented in this schema.
For the latter requirement, a Document Type Definition (DTD) file won’t work, as DTDs primarily control nesting and frequency of occurrence of elements; XML Schema won’t work, as it is concerned more with data types and other constraints rather than the metalanguage nature of “class” and “property.” RELAX NG is more easily processed than either of those, but again it is solving different problems.
As you can use XML to serialize the contents of a relational database, you can use XML to serialize the contents of an RDF-based model—but XML isn’t a replacement because XML is nothing more than a syntax. You need a metalanguage vocabulary to be able to use XML to record business domain information in such a way that any business can be documented, and RDF provides this capability.
However, don’t take my word for it; try it yourself in the next several sections when you have a chance to see how a vocabulary is created.
As the Web has matured, more and more of the posted content is aging beyond usefulness. In many cases, this aged content is just deleted from a web site, resulting in “404 Page not found” errors when you click through to the content from some search engine or via a link from another web page. Hitting a missing page is particularly frustrating if you’ve come to the page because of a description associated with it that exactly fits your current interest, and you don’t even know why the page was deleted or if the resource might exist somewhere else.
A further problem with maturing web sites is that site structure doesn’t remain constant—due to the use of new technologies or new directions in content management, resources may be moved around at the site or even moved to new domains. When you access the content, the less-than-helpful sites return with something along the lines of:
404 Not Found We're sorry, the file that you requested does not exist or has moved.
Well, which is it? Is the page missing, or was the request invalid because the content’s moved? If you get this message as a result of clicking on a link from another site, is it because the content’s really been deleted or moved, or because the linking site made a mistake with the link? Is the site that owns the content using a new system of cataloging its resources, breaking existing links?
Other sites provide a page with a forwarding message and a link to redirect you to the new content. As important as these redirections are, though, the reasons behind the move may be additional information that can be useful in determining whether the resource is worth pursuing through what could end up being a chain of redirections, with each link in the chain reflecting a different move.
Unfortunately, the reasons for the move aren’t maintained with the redirection in most cases.
Another problem is aging content that isn’t deleted. With this type of page, you could be halfway through reading it only to realize that it talks about a product or technology that’s been obsolete for years. There’s nothing to indicate the relevance of the page, and external factors associated with the page, such as the page title or label, may not provide enough context to determine whether the resource is useful for your purposes or not.
Netscape’s support of Dynamic HTML (DHTML) for the company’s browser is a classic case of content being under one label—DHTML—with two drastically different implementations based on browser version. DHTML for Version 4.x of Netscape won’t work with the current Netscape 6.x products and vice versa. The only way to determine whether a page titled “Working with DHTML in Netscape” is useful for your purposes is to read it and hope you know enough about the subject to know whether you’re wasting your time.
Content management systems such as FrontPage, Vignette, and others
help with creating, posting, and managing the original content, but do
not help provide information about the context of the resource. meta
tags can be attached to each HTML
resource providing copyright information, keywords, or authorship, but
nothing regarding the expected life expectancy of the resource or its
move history, including reasons for the move, unless you put this
information into the description — an approach that isn’t standardized
and therefore not useful.
These systems are as helpless as web browsers at determining whether a 404 error occurred because of a typo, a relocation, or a resource no longer being maintained at the site.
What’s needed is a content system that takes over after the content management systems have finished their task of posting the content: a postcontent information system that can be accessed by a runtime application and provide information about the resource to the resource consumers. Such a system must provide information that is useful for humans and is also usable by automated processes.
We’ll use this type of system to demonstrate how to create an RDF vocabulary and, eventually, how to use the vocabulary just created. For simplicity in this chapter (and later in the book), I’ll refer to this system as PostCon.
How to start defining the vocabulary for this type of system? Compatible with most application efforts, the first step to creating the vocabulary is to define the business domain elements and their properties of interest within the given business scope.
Defining the business elements for a new system is the same process whether the domain is being defined for use within a more traditional relational database or within a system with data defined and managed through RDF-capable processes. Following from existing data modeling techniques, you first describe the major entities and their properties, then describe how these entities are related to one another.
PostCon has one major or root element, the web site resource; the system is interested in this resource from six different perspectives:
What is the content’s bio—who wrote it, who owns it, when was it created, and what are its subject and topic?
What is the content’s relevancy—has it been updated for new circumstances and does it have a date beyond which it is no longer pertinent?
What is the content’s history of movement—has it been deleted? If so, why? Has it moved? If so, why, and where is it now?
What are the content’s related resources—has it been replaced? Are other resources related to it? Are other resources dependent on it, or is it dependent on other resources?
If the resource no longer exists, are there replacements? Why are they replacements?
What are the presentation characteristics of the content? Its type? Does it conform to any standard? Does it require specialized user agents? Are there any dependencies?
The set of PostCon objects consists of a web resource, its bio,
a movement associated with the resource, presentation and type
information, and other related resources. Each object is then
described by a set of properties. Many of these are compatible with
HTML meta
tag elements such as
Title and Content and should be synchronized with the values included
within the HTML; others are unique to the system.
The main system elements are then described by a set of properties, as defined in Table 6-1.
Element | Property | Description |
| Unique Content ID | To identify content |
Biography | Content biographical information | |
Relevancy | Relevancy of content | |
History | History of content movement | |
Related | Related content | |
Presentation | Content type and presentation | |
| Title | Resource’s title |
Resource Abstract | Excerpt from resource if applicable | |
Resource Description | Description of Resource | |
Creation Date | Date resource was first created | |
Content Author | Person or organization responsible for creating content | |
Content Owner | Person or organization who owns copyright on content | |
| Content Status | Current status of content |
Subject | Subject/topic of resource (may duplicate) | |
Relevancy Expiration | Date when content is aged beyond usefulness | |
References | External resources referenced in content | |
Referenced by | External resources that reference content | |
| Movement | Location at end of movement |
Reason | Reason for movement | |
Date | Date of movement | |
Type | Type of movement | |
| Related Resource | Resource URI |
Reason | Reason for relationship | |
| Recommended Resource | URI of recommended replacement |
Title | Title of replacement | |
Reason | Reason for recommendation | |
| Format | Format of resource |
Conformity | Standards/specifications resource conforms to (may repeat) | |
Requires | Resource dependencies (may repeat)—may have associated type of requirement as well as required resource (may repeat) |
The Unique Resource ID (URI) is defined once for the content and
follows it regardless of the content’s current location. The Resource
Title property is equivalent to the HTML Title element, and the
Resource Description is equivalent to the Description meta
tag, which contains a short abstract of
the resource’s contents:
<meta name="description" content="Dynamic Earth site focuses on science and the world and universe around us. You can never know too much">
The material within the content
attribute is used for the Resource
Description content. The Content Author is equivalent to the Author
meta
tag, and the Content Owner is
equivalent to the Copyright meta
tag:
<meta name="author" content="Shelley Powers"> <meta name="copyright" content="© 1997-2003 Burningbird">
The Content Status for the web resource contains information about the current status of the document, such as whether it has been deleted or is still active. The Relevancy Expiration is a date when the content author expects the resource contents to become dated and no longer viable. The Requires property also provides information about the viability of the content, such as being dependent on Version 1.0 of a specific product release.
The History of the resource tracks its movement throughout the network, as well as the date and reason for the move. This is particularly useful when providing information about deleted content. The Related material provides information about replacement URLs for content that is no longer viable, and the Recommendation material covers additional recommended material complementary to the material, while the Presentation reflects information necessary to “consume” the resource, as it were.
For a specific web resource, there is one Resource bio, Relevancy, History, and Presentation sections, but many related items. Additionally, within the History section there can be many movements. This and the domain information are then used to prototype the RDF vocabulary, as described next.
Before creating a formal RDFS document for the new vocabulary, you should prototype the model with several different instances of it, to ensure that the results corroborate the expected outcome. During this process, check the validity of your data with the RDF Validator, which validates the result against the standard and also provides an edged graph and N-Triples breakdown of the RDF.
You can access the RDF Validator at http://www.w3.org/RDF/Validator/.
As a test case for the PostCon vocabulary, information about the giant squid articles introduced in Chapter 2 through Chapter 4 is recorded using the domain elements from the last section. The articles are particularly useful as test cases because they have been moved about, are related to each other, reference, and are referenced by external resources. About the only thing that the articles don’t demonstrate is when a web resource has been deleted, and we’ll test this out with another document later.
When creating a new vocabulary, the first thing to do is define the URI for the vocabulary namespace. By convention, this should be the URL of the RDFS document when it is eventually made. In the case of PostCon, I used the following URL for the namespace:
http://burningbird.net/postcon/elements/1.0/
This is actually fairly descriptive—this is the location of the set of PostCon Version 1.0 vocabulary elements. When the RDFS document for the vocabulary is finished, it will be dropped into this location primarily for use by utilities that make use of it for RDF/XML exploration (covered in Chapter 7).
There is no requirement as to the structure of the URI for a namespace, nor does the RDFS document have to exist—but it is good practice to use a consistent namespace and to create the document and place it in the URL of the namespace.
Next up is determining what the URI of the web resource is. We could actually create an identifier for our resources, but my preference for the PostCon system is just to use it as the identifier the URL of the resource when it was first defined within the PostCon RDF/XML vocabulary. What’s important is that it be consistent and unique—any other requirements are purely system dependent, not RDF/XML dependent.
I used the first document in the article series as the test case, and since it was located within the domain http://burningbird.net and within the articles subdirectory, its URI became:
http://burningbird.net/articles/monsters1.htm
However, to simplify the model, xml:base
(explained in Chapter 3) is used and set to a value
of http://burningbird.net/articles
,
and the resource URI is set to monsters1.htm
.
The other top-level predicates are added sans their predicates to give a relatively flat model at this point. Example 6-1 shows the RDF/XML at this stage.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/" xml:base="http://burningbird.net/articles/"> <rdf:Description rdf:about="monsters1.htm"> <pstcn:bio /> <pstcn:relevancy /> <pstcn:presentation /> <pstcn:history /> <pstcn:related /> </rdf:Description> </rdf:RDF>
Next, we’ll start adding the other predicates to the model, but
first, there’s one change we want to make to the model. As it is
currently defined, we have the resource, but we don’t necessarily know
what it is. It is a web resource, but by the model’s definition it
could be any other resource that can be defined by an arbitrary URI,
including a person, a place, or a thing. To refine the model, then,
we’ll add an rdf:type
predicate to
it, with a value of http://burningbird.net/postcon/elements/1.0/Resource
.
However, to make the model as simple as possible, we’ll use an RDF/XML
shortcut (detailed in Section 3.5) and replace the
rdf:Description
block with a
reference to this new class:
<pstcn:Resource> <pstcn:bio /> <pstcn:relevancy /> <pstcn:presentation /> <pstcn:history /> <pstcn:related /> </pstcn:Resource>
The directed graph that results from this change, as shown in
Figure 6-1, is no different
than if we had used the more formal rdf:Description
block with the associated
rdf:type
predicate.
Next we’ll start adding the predicates, beginning with pstcn:bio
. Since RDF/XML requires a striped
syntax of node-arc-node-arc, and rdf:bio
is acting as an arc, rdf:bio
’s contents must be redefined as a
blank node—a resource without a URI. Adding an rdf:Description
block to rdf:bio
and then adding its predicates as
shown in Example 6-2
accomplishes redefining rdf:bio
as
a blank node. The predicates are named the same as the attributes
defined in Table 6-1, but
converted to QNames per the RDF/XML requirement. Changes to the
RDF/XML are boldfaced.
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
xml:base="http://burningbird.net/articles/">
<pstcn:Resource rdf:about="monsters1.htm">
<pstcn:bio>
<rdf:Description>
<pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
<pstcn:abstract>
When I think of "monsters" I think of the creatures of
legends and tales, from the books and movies, and
I think of the creatures that have entertained me for years.
</pstcn:abstract>
<pstcn:description>
Part 1 of four-part series on cryptozoology, legends,
Nessie the Loch Ness Monster and the giant squid.
</pstcn:description>
<pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated>
<pstcn:author>Shelley Powers</pstcn:author>
<pstcn:owner>Burningbird Network</pstcn:owner>
</rdf:Description>
</pstcn:bio>
<pstcn:relevancy />
<pstcn:presentation />
<pstcn:history />
<pstcn:related />
</pstcn:Resource>
</rdf:RDF>
The rdf:bio
resource isn’t
given a URI because one doesn’t exist for it. The resulting graph
shows a computer-generated blank node identifier assigned to the
resource.
Again, in the interests of simplifying the model as much as
possible, another RDF/XML shortcut is applied to the model. In this
case, the attribute rdf:parseType
is
added to the pstcn:bio
element, and
its value is set to "Resource"
.
Doing this, we can eliminate the rdf:Description
block:
<pstcn:bio rdf:parseType="Resource"> <pstcn:title>Tale of Two Monsters: Legends</pstcn:title> <pstcn:abstract> When I think of "monsters" I think of the creatures of legends and tales, from the books and movies, and I think of the creatures that have entertained me for years. </pstcn:abstract> <pstcn:description> Part 1 of four-part series on cryptozoology, legends, Nessie the Loch Ness Monster and the giant squid. </pstcn:description> <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated> <pstcn:author>Shelley Powers</pstcn:author> <pstcn:owner>Burningbird Network</pstcn:owner> </pstcn:bio>
Though simplified with this syntactic change, the resulting directed graph of the model at this point, as shown in Figure 6-2, is equivalent to the longer, more formal syntax.
Though the resulting XML is simpler when using one of the established shortcuts, it doesn’t necessarily reflect either the N-Triples or the directed graph of the model. This could be confusing for people new to RDF/XML. When documenting your model, you’ll most likely want to start with the more formal RDF/XML syntax and then demonstrate the vocabulary with instances that use the shortcuts.
In Figure 6-2, I show the bio properties grouped via a blank node. Coming from a relational database background, my first inclination is to group related properties into a resource and link this back to the primary resource, rather than “flatten” the model and include each property as a direct attribute of the original resource. I follow this approach with RDF, primarily because, in my opinion, it leads to cleaner RDF processing—whether that processing occurs manually or through automation.
If I had listed each of the “grouped” properties directly with the resource, there’s no breakdown for relevancy or for the resource’s bio. If a specific process was interested only in the biographical elements, each bio-related attribute would then have to be defined as biographically related to highlight it from the other properties. Now, if the bio-related properties were defined within one specific RDF “entity” (resource), it’s a simple matter to process only bio properties just by processing all elements within the designated bio resource. Whether you’re generating RDF through an API, consuming it with an RDF parser, or visually looking at an RDF document, grouping the properties through derived resources makes sense.
The other groupings of attributes, such as relevancy and
presentation, are completed in the same manner as bio and I won’t
cover all that here. However, the Related
predicate is handled differently and
is therefore covered in the next section.
Not all recorded values occur as single properties within the PostCon vocabulary—a web resource can move many times, and there can be more than one recommended resource to replace an outdated item. The vocabulary must be able to handle repeating properties. Within the RDF specification, you can use the same predicate in multiple statements, such as the following:
<pstcn:related rdf:resource="monsters2.htm" /> <pstcn:related rdf:resource="monsters3.htm" /> <pstcn:related rdf:resource="monsters4.htm" />
The distinguishing aspect of these statements then becomes the object, the predicate value. Attached to the primary resource, this syntax states that there are three related resources for the entity being defined. It also states that there’s no order to the resources, and the only point of connectivity between the resources is that they’re related, in some way, to the original entity. There is neither an implicit nor an explicit grouping between the items.
At this point, the RDF/XML just shows the three related resources, and the resulting directed graph would show these items with ovals drawn around the objects as well as the resource. However, if I wanted to include additional information about the relationship between the related resources and the resource being defined in the document, I could do so in a couple of ways.
First, I can define the related resource using the rdf:parseType="Resource"
setting as I did
with pstcn:bio
. The problem with
this is that each of the related resources actually does have a URI,
and using rdf:parseType
, I’d lose
this information. Instead, what I’ll use is the rdf:resource
attribute. This allows me to
specify the URI for the resource.
Since these resources are related but separate from the main
resource, I tend to want my model to reflect this, so I’ll define the
related resources as separate resources, related only through the URI.
Example 6-3 shows the
RDF/XML for the PostCon instance with the three related resources,
each of them defined using the pstcn:Resource
class, and each including the
related resource attributes of title
and reason
.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/" xml:base="http://burningbird.net/articles/"> <pstcn:Resource rdf:about="monsters1.htm"> <pstcn:bio rdf:parseType="Resource"> <pstcn:title>Tale of Two Monsters: Legends</pstcn:title> <pstcn:abstract> When I think of "monsters" I think of the creatures of legends and tales, from the books and movies, and I think of the creatures that have entertained me for years. </pstcn:abstract> <pstcn:description> Part 1 of four-part series on cryptozoology, legends, Nessie the Loch Ness Monster and the giant squid. </pstcn:description> <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated> <pstcn:author>Shelley Powers</pstcn:author> <pstcn:owner>Burningbird Network</pstcn:owner> </pstcn:bio><pstcn:related rdf:resource="monsters2.htm" /> <pstcn:related rdf:resource="monsters3.htm" /> <pstcn:related rdf:resource="monsters4.htm" />
</pstcn:Resource><pstcn:Resource rdf:about="monsters2.htm"> <pstcn:title>Cryptozooloy</pstcn:title> <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <pstcn:Resource rdf:about="monsters3.htm"> <pstcn:title>A Tale of Two Monsters: Architeuthis Dux </pstcn:title> <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <pstcn:Resource rdf:about="monsters4.htm"> <pstcn:title>Nessie, the Loch Ness Monster </pstcn:title> <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource>
</rdf:RDF>
Since the predicates associated with each related resource are simple and nonrepeating, I’m going to apply another shortcut to simplify the model—simple nonrepeating predicates can be listed as attributes on the resource:
<pstcn:Resource rdf:about="monsters2.htm" pstcn:title="Cryptozooloy" pstcn:reason="First in the Tale of Two Monsters series." /> <pstcn:Resource rdf:about="monsters3.htm" pstcn:title="A Tale of Two Monsters: Architeuthis Dux" pstcn:reason="Second in the Tale of Two Monsters series." /> <pstcn:Resource rdf:about="monsters4.htm" pstcn:title="Nessie, the Loch Ness Monster" pstcn:reason="Fourth in the Tale of Two Monsters series." />
The resulting RDF/XML and directed graph are the same. The only difference this change makes is to make the XML simpler and a little easier to read. It’s also more comfortable for people familiar with XML, though, as stated earlier, it does tend to obscure the RDF constructs.
Another reason to use this shortcut is that, if I preferred not
to list the resources separately, I could list them as is with the
predicates redefined as attributes, directly back into main resource.
You couldn’t do this using the rdf:resource
attribute because you couldn’t
add formalized predicates to the block without generating errors. You
would have to use the more formal node-arc-node by defining the
predicate (pstcn:related
), which
would contain the rdf:Description
block, which would then contain the related predicates:
<pstcn:related> <rdf:Description rdf:about="monsters3.htm" pstcn:title="A Tale of Two Monsters: Architeuthis Dux" pstcn:reason="Second in the Tale of Two Monsters series." /> </pstcn:related>
However, you can add predicates to the related resources that
have been defined through the use of pstcn:Resource
, by using the predicates as
attributes shortcut, as demonstrated in Example 6-4.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/" xml:base="http://burningbird.net/articles/"> <pstcn:Resource rdf:about="monsters1.htm"> <pstcn:bio rdf:parseType="Resource"> <pstcn:title>Tale of Two Monsters: Legends</pstcn:title> <pstcn:abstract> When I think of "monsters" I think of the creatures of legends and tales, from the books and movies, and I think of the creatures that have entertained me for years. </pstcn:abstract> <pstcn:description> Part 1 of four-part series on cryptozoology, legends, Nessie the Loch Ness Monster and the giant squid. </pstcn:description> <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated> <pstcn:author>Shelley Powers</pstcn:author> <pstcn:owner>Burningbird Network</pstcn:owner> </pstcn:bio> <pstcn:Resource rdf:resource="monsters2.htm" pstcn:title="Cryptozooloy" pstcn:reason="First in the Tale of Two Monsters series." /> <pstcn:Resource rdf:resource="monsters3.htm" pstcn:title="A Tale of Two Monsters: Architeuthis Dux" pstcn:reason="Second in the Tale of Two Monsters series." /> <pstcn:Resource rdf:resource="monsters4.htm" pstcn:title="Nessie, the Loch Ness Monster" pstcn:reason="Fourth in the Tale of Two Monsters series." /> </pstcn:Resource> </rdf:RDF>
In some ways, this demonstrates that you either commit to using formal syntax all the way, or you commit to using abbreviated (shortcut) syntax all the way—at least for one complete RDF construct, such as the related items. Since my reasons for wanting to list the related resources separately remain, even though the RDF/XML and resulting directed graph are identical, I’ll continue to use the approach demonstrated in Example 6-3.
If I want to show that predicates are related to one another in some way beyond just being related to the defined entity, I’ll use a container to group the items and then attach that container to the entity. The next section describes how.
The PostCon vocabulary considers movements of the web resource related to one another. The first movement occurs when the resource is added to the web site; the second and each additional movement are related to one another by the date and time of the movement. Infinite numbers of movements are possible.
To group like items that are related to one another as well as to the main resource, I could use either an RDF Container or a Collection. Both provide the grouping-of-related-items semantics that I need, but the relationship and number of items within the grouping differ based on which construct I use. And that’s how I’ll determine which to use.
As described in Chapter
4, a Container is a group of related items that has no nth
point—in other words, it could possibly contain an infinite number of
items. A Collection, on the other hand, always has an endpoint, the
implicit rdf:nil
. Use of Collection
creates the assumption that the grouping is of a finite number of
objects.
Additional tool-based semantics are associated with containers
and collections—such as sequence with rdf:Seq
and so on—but these aren’t enforced
within the RDF data model/graph, so I won’t depend on them to make my
decision about what to use. Instead, I’ll rely on the one factor that
is semantically defined in the RDF graph: whether the number of items
in the group is infinite. Since I determined that a web resource can
have infinite movements, I will choose an RDF Container.
I now face additional choices, such as which container type to
use. There is no enforcement of the Container differences within RDF,
but there is a general assumption about behavior attached to each, so
I’ll want to pick the RDF Container type (Seq
, Bag
,
or Alt
) that fits my vocabulary
model.
Since each movement is unique, the Bag
type isn’t a good fit because an
implicit assumption associated with it is that items can be
duplicated. Nor is the Alt
type a
good fit, because it implicitly represents items that are alternatives
to each other. The best fit is Seq
, which has
implicit associated semantics of related items in a sequence, from
first to last. This fits history
particularly well.
Each movement has its own URI representing the movement itself,
so each one can be identified distinctly. Because of this, my
preference is, again, to list these out separately, related to the
main resource through the container. Example 6-5 shows the PostCon
vocabulary after adding in the Seq
container. Note that I created a new class for the movement, pstcn:Movement
. I couldn’t use pstcn:Resource
, because the movements really
aren’t resources. I could have also left the resources defined in
generic rdf:Description
blocks, but
I prefer to embed as much information into the model as possible, and
defining the new class—Movement—provides a type to go with each
movement definition, independent of the relationship defined by
history
earlier in the main
resource.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/" xml:base="http://burningbird.net/articles/"> <pstcn:Resource rdf:about="monsters1.htm"> <!--biography of resource--> <pstcn:bio rdf:parseType="Resource"> <pstcn:title>Tale of Two Monsters: Legends</pstcn:title> <pstcn:abstract> When I think of "monsters" I think of the creatures of legends and tales, from the books and movies, and I think of the creatures that have entertained me for years. </pstcn:abstract> <pstcn:description> Part 1 of four-part series on cryptozoology, legends, Nessie the Loch Ness Monster and the giant squid. </pstcn:description> <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated> <pstcn:author>Shelley Powers</pstcn:author> <pstcn:owner>Burningbird Network</pstcn:owner> </pstcn:bio> <!--related resources--> <pstcn:related rdf:resource="monsters2.htm" /> <pstcn:related rdf:resource="monsters3.htm" /> <pstcn:related rdf:resource="monsters4.htm" /> <!--resource movements--> <pstcn:history> <rdf:Seq> <rdf:_1 rdf:resource="http://www.yasd.com/dynaearth/monsters1.htm" /> <rdf:_2 rdf:resource="http://www.dynamicearth.com/articles/monsters1.htm" /> <rdf:_3 rdf:resource="http://burningbird.net/articles/monsters1.htm" /> </rdf:Seq> </pstcn:history> </pstcn:Resource> <!--related resource defintions--> <pstcn:Resource rdf:about="monsters2.htm"> <pstcn:title>Cryptozooloy</pstcn:title> <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <pstcn:Resource rdf:about="monsters3.htm"> <pstcn:title>A Tale of Two Monsters: Architeuthis Dux (Giant Squid)</pstcn:title> <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <pstcn:Resource rdf:about="monsters4.htm"> <pstcn:title>Nessie, the Loch Ness Monster </pstcn:title> <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <!--resource movement definitions--> <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters1.htm"> <pstcn:movementType>Add</pstcn:movementType> <pstcn:reason>New Article</pstcn:reason> <pstcn:date>1998-01-01T00:00:00-05:00</pstcn:date> </pstcn:Movement> <pstcn:Movement rdf:about="http://www.dynamicearth.com/articles/monsters1.htm"> <pstcn:movementType>Move</pstcn:movementType> <pstcn:reason>moved to dynamicearth.com domain</pstcn:reason> <pstcn:date>1999-10-31:T00:00:00-05:00</pstcn:date> </pstcn:Movement> <pstcn:Movement rdf:about="http://burningbird.net/articles/monsters1.htm"> <pstcn:movementType>Move</pstcn:movementType> <pstcn:reason>Moved to burningbird.net</pstcn:reason> <pstcn:date>2002-11-01:T00:00:00-05:00</pstcn:date> </pstcn:Movement> </rdf:RDF>
There is also something intriguing in this RDF/XML example—the actual resource is defined both as the document Resource and as a Movement (in fact, the last movement for the history since the resource was defined in the PostCon system before any additional movements were made). This is perfectly legitimate and results in an interesting directed graph of a resource that has an arc pointing back to itself, as demonstrated in Figure 6-3.
Also notice in the figure that the original resource now has two type properties associated with it: one for Resource and one for Movement. Again, this is perfectly legitimate RDF. In fact, the more knowledge we can put into the model, and the simpler the syntax, the better.
The example RDF/XML demonstrated to this point has
focused on bio, history, and related resources. The other PostCon
classes—Relevancy and Presentation—are treated the same as bio, except
for one new construct: the Presentation’s Required property. Unlike
other properties defined in the document up to this point, Requires is
neither a straight resource property nor is it a literal—it’s a value
that has an associated type that determines how the value is treated.
The ideal RDF/XML construct to use to represent this is rdf:value
.
Without replicating all of the Relevancy properties, the
following RDF/XML demonstrates how rdf:value
would work for pstcn:requires
. The pstcn:requires
property is defined with an
rdf:parseType
of "Resource"
, and has two attributes: pstcn:type
, which specifies the type of
required resource, and rdf:value
,
which signals the actual value. Two resources are required:
<pstcn:presentation rdf:parseType="Resource"> <pstcn:requires rdf:parseType="Resource"> <pstcn:type>stylesheet</pstcn:type> <rdf:value>http://burningbird.net/de.css</rdf:value> </pstcn:requires> <pstcn:requires rdf:parseType="Resource"> <pstcn:type>logo</pstcn:type> <rdf:value>http://burningbird.net/mm/dynamicearth.jpg</rdf:value> </pstcn:requires> </pstcn:presentation>
The intended semantics for rdf:value
are that it always references the
actual value of the predicate—anything else is just definitive
information about how that predicate is treated.
The rest of the vocabulary uses the same constructs as have been used to this point and is omitted for brevity. A complete example of the vocabulary is given later, after a few modifications are made to merge the vocabulary with the Dublin Core. In the meantime, though, testing the vocabulary demonstrated to this point with other web site test cases shows that it tests out with all the business domain data. At this point, we can be comfortable that the vocabulary matches the system needs. The next step is to formalize the vocabulary schema using RDF Schema.
Formally defined RDFS schemas aren’t required for all RDF documents, but the schema approach guarantees that a particular RDF document is semantically and syntactically consistent across implementations.
RDFS defines which vocabulary elements are classes and
which are properties. In addition, RDFS also matches a property with a
specific element, as well as defining the range for each property. This
is particularly helpful when defining properties that contain a range of
elements, such as the pstcn:movementType
property in the last
section. RDFS also documents the type of literal that each property can
reference—whether the property value is a string or a number, such as an
integer.
Determining what is a class and what is a property within the vocabulary is an interesting RDF Schema challenge. Your first reaction might be that an RDFS Class is equivalent to a relational data model Entity, but that doesn’t hold.
In actuality, an RDFS Class is any item that can be used in
place of an rdf:Description
block,
with an associated rdf:type
, such
as Movement or Resource. An RDFS Class is not a resource property,
like bio, Presentation, or Relevancy.
A quick test to double-check your use of RDFS Class versus RDFS Property for an item is to use ICS-FORTH’s Validating RDF Parser (VRP), asking for a graph output on the test RDF/XML document. This tool actually identifies which elements it views as classes and which as properties. This tool is covered in Chapter 7.
To start, define the PostCon vocabulary classes. Table 6-1 shows that the classes mark the main objects defined within the table, as you would expect. Using RDFS, then, the main object of the vocabulary, Resource, is defined with the following RDF/XML syntax:
<rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Resource"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/> <rdfs:label xml:lang="en"> Web Resource</rdfs:label> <rdfs:comment xml:lang="en"> Web resource managed with PostCon System </rdfs:comment> </rdfs:Class>
This RDF/XML defines Resource to be an RDF Class, defined within
the schema http://burningbird.net/postcon/elements/1.0/
,
which is a subclass of the RDF Resource type. Its human-readable label
is Web
Resource
, and the comments provide a brief
description of the item. Both label and comments have an xml:lang
attribute defining the language. If
you’re providing multilingual support for your elements, repeat the
label and comments but change the xml:lang
attribute value.
Though things such as label and comments aren’t necessary for the schema, you should always include these. BrownSauce, a Java-based RDF browser (described in Chapter 7), provides this information to people browsing RDF/XML documents.
This class by itself demonstrates the need for namespaces within RDF/XML; the RDF vocabulary also has a Resource class. The same type of RDFS/XML is also applied to bio, Movement, Relevancy, and Presentation, all of which are defined as classes. All other elements are defined as properties.
Each property within the vocabulary is defined, including providing data type information, human-readable comments and labels, and a definition of the relationship between properties and classes. The latter is particularly important because it provides usage guidelines as well as understanding of the schema.
An example of a property definition for PostCon is the following, for type:
<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/type"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Resource Type</rdfs:label> <rdfs:comment>Type of Required Resource</rdfs:comment> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> </rdf:Property>
The type element has a range that determines the type of values associated with it. In this case, the range is literal, meaning the element will contain literal values. In addition, there are two domains associated with the title that show the classes the property is associated with: bio and Movement.
The other properties are defined using almost the same schema,
changing the label, comments, and domain as appropriate; the two
properties history
and related
are different from the other
properties, though, because they don’t describe a literal. For
instance, here is the definition for the related
property:
<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/related"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en"> Related Resource</rdfs:label> <rdfs:comment xml:lang="en"> Resources within PostCon system related to current resource </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property>
The predicate object associated with related
is a resource of class Movement.
Other than that, though, the definition is fairly close to how all the
properties are defined.
The complete schema is shown in the next section. Note that testing the schema within the RDF Validator does prove that the RDF Schema is valid RDF/XML. The resultant RDF graph is a bit hard to read, though—all those references to the same RDFS classes.
Certain of the properties in the schema have an “allowable values are...” within them. There is currently no way to constrain allowable literals within the RDF Schema. However, since the schema is used more for human rather than machine interpretation, including this information within the comment is useful.
A vocabulary schema defines vocabulary elements and their relationship with one another and with the RDF and RDFS elements. For instance, since the PostCon schema document is a resource, using the PostCon vocabulary elements within the document to detail its creation is perfectly acceptable.
This approach is used within another widely used vocabulary, the Dublin Core (DC), which we will look at next and compare to the PostCon vocabulary. We’ll also find that we can modify PostCon to make use of DC elements, simplifying it.
According to the mission statement, located at http://www.dublincore.org/:
The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI’s activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.
The Dublin Core’s primary purpose is to discover a metadata model that can be used to describe resources intelligently so that this information can be used in more efficient and intelligent resource searches, knowledge systems, and so on.
At first, this description of Dublin Core may position it as a competitive specification to RDF, but in reality, they’re highly compatible. Dublin Core is an effort to define the business data of the Web, so to speak. RDF, on the other hand, is a way of recording this metadata so that it can be merged with other metadata defined for other businesses, not just the business of the Web. In other words, RDF is the methodology, and Dublin Core is one business employing the RDF methodology.
Since Dublin Core is an effort to define business data, serializing that data need not be done with RDF. The Dublin Core project provides an RDF/XML version of the data that it has defined, true. But it also provides one in simple, basic XML and one in HTML. However, it is the RDF/XML version we’re interested in and will focus on at this time.
The Dublin Core MetaData Element set (Version 1.1, found at http://www.dublincore.org/documents/1999/07/02/dces/. consists of a core set of elements that comprise what is known as simple Dublin Core. These elements are:
title
A name given to the resource
creator
An entity responsible for making the content of the resource
subject
The topic of the content of the resource
description
An account of the content of the resource
public
An entity responsible for making the content available
contributor
An entity responsible for making contributions to the content of the resource
date
A date associated with an event in the life cycle of the resource
type
The nature or genre of the content of the resource
format
The physical or digital manifestation of the resource
identifier
An unambiguous reference to the resource within a given context
source
A reference to the resource from which the present resource is derived
language
A language of the intellectual content of the resource
relation
A reference to a related resource
coverage
The extent or scope of the content of the resource
rights
Information about rights held in and over the resource
Associated with the different entities is additional information, such as Language being derived from the two-character language code derived from the ISO 639 document (such as “EN” for English) and a date format for date (YYYY-MM-DD).
As you can see immediately, several DC elements could be used in place of PostCon elements. First, though, let’s take a look at Dublin Core implemented as RDF/XML.
The Dublin Core vocabulary is one of the simplest, which is probably one reason it’s so heavily used. The namespace for the elements is at:
http://purl.org/dc/elements/1.1/
If you go to this URL with your browser, you’ll see an actual
document, with a schema description for each element. The prefix
usually given for the Dublin Core namespace within an RDF document is
dc
, which we’ll use in this
chapter.
I won’t include the document here, nor will I discuss each element. However, some elements are of particular interest because they seem to map to a PostCon element. And if there’s a way of reducing PostCon, we’ll want to pursue it.
For instance, one element from PostCon that definitely
looks to be in DC is title
. The
Dublin Core title
is defined to be
“a name given to the resource.” Since our definition of title
in PostCon is “resource’s title,” we
have a match. Looking at the schema definition for the property we
find:
<rdf:Property rdf:about="http://purl.org/dc/elements/1.1/title"> <rdfs:labelxml:lang="en-US">Title</rdfs:label> <rdfs:commentxml:lang="en-US">A name given to the resource.</rdfs:comment> <dc:descriptionxml:lang="en-US">Typically, a Title will be a name by which the resource is formally known.</dc:description> <rdfs:isDefinedByrdf:resource="http://purl.org/dc/elements/1.1/" /> <dcterms:issued>1999-07-02</dcterms:issued> </rdf:Property>
There are some differences between this and the original PostCon
title
schema definition. For
instance, the schema for the PostCon title
listed the property’s domains (that
is, acceptable contexts for the property) to be the pstcn:Resource
class (and indirectly to
Movement, which is a subclass of pstcn:Resource
). The DC doesn’t list domains
because it doesn’t seek to limit what classes it can be used for,
opening the door for us to use the property in PostCon.
Another difference is that DC is used directly to describe the
property. Again, this won’t adversely impact the use of title
in PostCon. In fact, the additional
information is helpful. Finally, there is another property assigned to
a different namespace: dcterms:issued
. Before we can determine
whether this property will limit our use of title
in PostCon, we’ll have to take a
closer look at this new schema.
For more on Dublin Core in RDF/XML, see the pending recommendation “Expressing Simple Dublin Core in RDF/XML,” authored by Dave Beckett, Eric Miller, and Dan Brickley, and found at http://www.dublincore.org/documents/2001/11/28/dcmes-xml/.
All of the Dublin Core metadata elements are properties within the context of RDF. Within an RDF graph, that means that all of them radiate out from a single resource. Again, this makes the vocabulary attractive to use because it is so simple and uncomplicated. However, there are basic limitations to how broadly one can stretch any one element to meet a specific use. And by stretching meanings at all, we lose some refinement.
Sure, we can group all dates together, but do we want to?
So, the Dublin Core Working Group set out to define a set of qualifiers that limit or modify the meaning of the DC elements. Additionally, the group determined that the qualifiers belonged in one of two different categories: qualifiers for element refinement and qualifiers for encoding schema.
Element refinement qualifiers restrict the scope of the element.
For instance, there is the general concept of date and then there is
creation date (from PostCon), modified date, and so on. Those
vocabularies that want such refinements can use things such as
modified date and creation date. However, vocabularies (or
applications) that don’t care about the refinement can ignore it and
just treat the qualified elements as date
.
Element refinement qualifiers are based on the business of the
schema rather than its implementation. Encoding schema qualifiers,
though, exist purely to help with parsing and interpretation of the
data. Again, date
can have many
interpretations as to what type of date is being recorded. By using
encoding schema qualifiers, there’s no confusion about what to expect
for data within a specific date field.
When looking at Dublin Core, we can see uses for several of the elements, but when we look at the qualified Dublin Core implemented in RDF/XML, we find a strong match for several PostCon classes and properties.
First, the namespace for the qualified Dublin Core Schema is at
http://purl.org/dc/terms/. The namespace
prefix for the qualified Dublin Core is usually dcterms
.
The first property that attracts attention is created, a
qualifier on the date
property. The
created definition is:
<rdf:Property rdf:about="http://purl.org/dc/terms/created"> <rdfs:label>Created</rdfs:label> <rdfs:comment>Date of creation of the resource.</rdfs:comment> <rdfs:subPropertyOf rdf:resource = "http://purl.org/dc/elements/1.1/date" /> <rdfs:isDefinedBy rdf:resource="http://purl.org/dc/terms/" /> </rdf:Property>
The thing to focus on is the comment Date
of
creation
of
the
resource
. This exactly matches the
description for the pstcn:creationDate
property in PostCon. In
the last section, we weren’t sure how to handle the dcterms:issued
, but now we know it’s nothing
more than an issued date, a further qualification of the specification
for the title
property.
Another set of properties that seemed similar to PostCon
elements is the DC Relation property and the qualified replacers:
dcterms:isReplacedBy
, dcterms:seeAlso
, dcterms:references
, and so on. They’re not
used to replace PostCon’s related
property (and associated Resource class) though because the DC
properties have built-in semantics that don’t encompass all of
PostCon’s related property semantics. However, PostCon’s pstcn:dependencies
and DC’s qualifier
dcterms:requires
seem to be a good
match.
After the first glance, both the original Dublin Core elements and the qualified element set seem to have good replacements, or additions, to the PostCon vocabulary. And since both are defined within RDF, it will be simple to use them together in RDF/XML documents.
After the first glance at the Dublin Core simple elements, I decided to replace the PostCon attributes demonstrated in this chapter with matching DC elements. These include the following replacements:
pstcn:title
dc:title
pstcn:author
dc:creator
pstcn:owner
dc:publisher
pstcn:abstract
dcterms:abstract
pstcn:description
dc:description
pstcn:creationDate
dc:created
pstcn:date
dc:date
I also decided to add the format
property, to provide the resource
file type. Small changes, but they do reduce the size of the PostCon
vocabulary, as well as allowing easier data sharing on these
items.
To see how these two vocabularies work together, the RDF/XML for
the sample monsters1.htm
resource
is provided in Example 6-6.
The Dublin Core Schema namespaces are added to the top-level RDF
element, and the dc
and dcterms
properties are used in place of the
now-removed PostCon properties. In addition, both Relevancy and the
Presentation resources have been added to complete the
document.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://burningbird.net/articles/"> <pstcn:Resource rdf:about="monsters1.htm"> <!--Resource biographical information--> <pstcn:bio rdf:parseType="Resource"> <dc:title>Tale of Two Monsters: Legends</dc:title> <dcterms:abstract> When I think of "monsters" I think of the creatures of legends and tales, from the books and movies, and I think of the creatures that have entertained me for years. </dcterms:abstract> <dc:description> Part 1 of four-part series on cryptozoology, legends, Nessie the Loch Ness Monster and the giant squid. </dc:description> <dc:created>1999-08-01T00:00:00-06:00</dc:created> <dc:creator>Shelley Powers</dc:creator> <dc:publisher>Burningbird Network</dc:publisher> </pstcn:bio> <!--Resource's relevancy at time RDF/XML document was built--> <pstcn:relevancy rdf:parseType="Resource"> <pstcn:currentStatus>Active</pstcn:currentStatus> <dcterms:valid>2003-12-01T00:00:00-06:00</dcterms:valid> <dc:subject>legends</dc:subject> <dc:subject>giant squid</dc:subject> <dc:subject>Loch Ness Monster</dc:subject> <dc:subject>Architeuthis Dux</dc:subject> <dc:subject>Nessie</dc:subject> <dcterms:isReferencedBy rdf:resource="http://www.pibburns.com/cryptozo.htm" /> <dcterms:references rdf:resource="http://www.nrcc.utmb.edu/" /> </pstcn:relevancy> <!--Presentation/consumption information about resource--> <pstcn:presentation rdf:parseType="Resource"> <dc:format>text/html</dc:format> <dcterms:conformsTo>XHTML 1.0 Strict</dcterms:conformsTo> <dcterms:conformsTo>CSS Validation</dcterms:conformsTo> <dcterms:requires>HTML User agent</dcterms:requires> <pstcn:requires rdf:parseType="Resource"> <pstcn:type>stylesheet</pstcn:type> <rdf:value>http://burningbird.net/de.css</rdf:value> </pstcn:requires> <pstcn:requires rdf:parseType="Resource"> <pstcn:type>logo</pstcn:type> <rdf:value>http://burningbird.net/mm/dynamicearth.jpg</rdf:value> </pstcn:requires> </pstcn:presentation> <!--History of events of resource--> <pstcn:history> <rdf:Seq> <rdf:_1 rdf:resource="http://www.yasd.com/dynaearth/monsters1.htm" /> <rdf:_2 rdf:resource="http://www.dynamicearth.com/articles/monsters1.htm" /> <rdf:_3 rdf:resource="http://burningbird.net/articles/monsters1.htm" /> </rdf:Seq> </pstcn:history> <!--Resources internal to PostCon that are related to resource--> <pstcn:related rdf:resource="monsters2.htm" /> <pstcn:related rdf:resource="monsters3.htm" /> <pstcn:related rdf:resource="monsters4.htm" /> </pstcn:Resource> <!--Related resources--> <pstcn:Resource rdf:about="monsters2.htm"> <dc:title>Cryptozooloy</dc:title> <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <pstcn:Resource rdf:about="monsters3.htm"> <dc:title>A Tale of Two Monsterss: Architeuthis Dux (Giant Squid)</dc:title> <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <pstcn:Resource rdf:about="monsters4.htm"> <dc:title>Nessie, the Loch Ness Monster </dc:title> <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason> </pstcn:Resource> <!--Resource events--> <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters1.htm"> <pstcn:movementType>Add</pstcn:movementType> <pstcn:reason>New Article</pstcn:reason> <dc:date>1998-01-01T00:00:00-05:00</dc:date> </pstcn:Movement> <pstcn:Movement rdf:about="http://www.dynamicearth.com/articles/monsters1.htm"> <pstcn:movementType>Move</pstcn:movementType> <pstcn:reason>Moved to separate dynamicearth.com domain</pstcn:reason> <dc:date>1999-10-31:T00:00:00-05:00</dc:date> </pstcn:Movement> <pstcn:Movement rdf:about="http://www.burningbird.net/articles/monsters1.htm"> <pstcn:movementType>Move</pstcn:movementType> <pstcn:reason>Collapsed into Burningbird</pstcn:reason> <dc:date>2002-11-01</dc:date> </pstcn:Movement> </rdf:RDF>
Running this document through the RDF Validator generates the expected RDF graph and no error.
One thing that this exercise demonstrates is the need to keep a vocabulary small and then add to it. As you saw with Dublin Core, the group started with a small set of important elements and then extended this with a new set of qualifier elements. This is a good approach for you to follow with your vocabularies and is the approach that other groups such as the RSS Working Group (discussed in Chapter 13) used. Doing so, others are more likely to make use of your vocabulary, and it also decreases the chances for modification in the future. The complete RDF Schema for PostCon, after the Dublin Core elements have been identified, is actually quite small. It’s shown in its entirety in Example 6-7.
<?xml version="1.0"?> <rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Resource"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en"> Web Resource</rdfs:label> <rdfs:comment xml:lang="en"> Web resource managed with PostCon system </rdfs:comment> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource" /> </rdfs:Class> <rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Movement"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en"> Web Resource Movement</rdfs:label> <rdfs:comment xml:lang="en"> An event for the resource within the PostCon system </rdfs:comment> </rdfs:Class> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/bio"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Resource biography</rdfs:label> <rdfs:comment xml:lang="en"> Biographical information for resource </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/relevancy"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Resource Relevancy</rdfs:label> <rdfs:comment xml:lang="en"> Biographical information for resource </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/presentation"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Resource Presentation</rdfs:label> <rdfs:comment xml:lang="en"> Information related to relevancy of resource </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/history"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en"> Web Content History</rdfs:label> <rdfs:comment xml:lang="en"> History of movement of content within system </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/currentStatus"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Current Status</rdfs:label> <rdfs:comment>Current status of document (allowable values of Active and Inactive)</rdfs: comment> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> <rdfs:domain rdf:resource="http://postcon/elements/1.0/Relevancy"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/reason"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Reason</rdfs:label> <rdfs:comment>Reason</rdfs:comment> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> <rdfs:domain rdf:resource="http://postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/movementType"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Movement Type</rdfs:label> <rdfs:comment>Type of Movement (allowable values of Move, Add, Remove)</rdfs:comment> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> <rdfs:domain rdf:resource="http://postcon/elements/1.0/Movement"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/related"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en"> Related Resource</rdfs:label> <rdfs:comment xml:lang="en"> Resources within PostCon system related to current resource </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/requires"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Resource Requirement</rdfs:label> <rdfs:comment xml:lang="en"> External resource required by current resource </rdfs:comment> <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> </rdf:Property> <rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/type"> <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/> <rdfs:label xml:lang="en">Resource Type</rdfs:label> <rdfs:comment>Type of Required Resource</rdfs:comment> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> </rdf:Property> </rdf:RDF>
The schema is in RDF/XML and can be validated. Once validated, it can be embedded within an outer HTML or XHTML document in the location of the schema URI or left as a pure RDF/XML document in same location. The main reason for doing this (it’s not required) is to give people the opportunity to review the schema to better understand the vocabulary. In addition, another reason to do this is that some tools, such as BrownSauce (which we’ll look at in detail in Chapter 7), use the schema to provide better information about the RDF graph.
Much about a document can be deleted directly from the
document itself. The format, location, subject, author, and copyright
from HTML meta
tags and so on can
all be derived from scraping the HTML for a particular web
resource.
Based on this, an organization going by the abbreviation UKOLN,
at the University of Bath in the UK, created the DC-dot generator.
This online application will scrape a web resource, pull whatever
information it can from it, and then return the result formatted in
multiple ways, including RDF, XHTML meta
tags, and straight XML.
Access DC-dot at http://www.ukoln.ac.uk/metadata/dcdot/.
I decided to try this with the sample “Tale of Two Monsters” article. In the first page of the application, I entered the URL for the document, and checked both boxes to have the tool attempt to determine publisher and return RDF. The page returned has a first guess at the RDF/XML and provides a form that you can then use to modify the DC elements generated. Figure 6-4 displays the form you can use to modify the results.
With some modifications, the DC RDF/XML document generated is shown in Example 6-8.
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd">
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description about="http://burningbird.net/articles/monsters3.htm">
<dc:title>
Tale of Two Monsters: Architeuthis Dux
</dc:title>
<dc:creator>
Shelley Powers
</dc:creator>
<dc:subject>
Internet; Web; Computers; Software; Technology;
Meteorology; Geology; Oceanography; Astronomy; Math;
Science; Physics; P2P
</dc:subject>
<dc:description>
The Giant Squid and its relationship to mythology.
</dc:description>
<dc:publisher>
Burningbird
</dc:publisher>
<dc:date>
2002-01-20
</dc:date>
<dc:type>
Text
</dc:type>
<dc:format>
text/html
</dc:format>
<dc:format>
8287 bytes
</dc:format>
</rdf:Description>
</rdf:RDF>
The generated RDF/XML validates with the RDF Validator, except
for one element, boldfaced in the example code—the generator uses an
unqualified about
attribute, which,
though allowed for existing vocabularies, is discouraged with new
vocabularies and RDF/XML instances. However, this is a quick change to
make.
Now that you’ve had a chance to try out RDF/XML, it’s time to try out a few of the many, many tools and utilities and APIs that have been created specifically for processing RDF/XML.
3.141.12.202