Chapter 6. Creating an RDF Vocabulary

Unlike other W3C specifications, such as HTML, you’re not going to see RDF documents consisting solely of the elements that have been described in Chapter 3 through Chapter 5. Yes, there is a defined syntax for RDF, as reviewed in Chapter 3 and Chapter 4, and there is an RDF Schema, explored in Chapter 5. However, RDF isn’t used to model business-specific resources directly because there are no domain-specific elements within the specification. Instead, RDF creates domain-specific vocabularies that are then used to model the resources, with an added advantage of having access to RDF-specific parsers and automated processes.

What kinds of vocabularies can be created? A better question is: what kinds of business resources can be described using a syntax/schema such as RDF? And the answer is: any business resource. The number of possible vocabularies is limitless, constrained only by each industry’s need for interoperable vocabularies.

In this chapter you’ll have a chance to see how a vocabulary is created and validated against the RDF syntax and schema. Once the elements for the vocabulary are defined, they’ll then be compared against an existing web resource domain vocabulary, the Dublin Core, to look for matches.

First, though, let’s take a closer look at what I mean when I say “RDF Vocabulary.”

How RDF Vocabularies Differ from XML Vocabularies

RDF is a way of recording information about resources; RDF, as serialized using XML, is a way of recording information about a specific business domain using a set of elements defined within the rules of the RDF data model/graph and the constraints of the RDF syntax, vocabulary, and semantics.

RDF recorded in XML is a very powerful tool—it’s been used to document events within a heterogeneous application environment, to describe publications, to record an environmental thesaurus, and so on. By using XML, you have access to a great number of existing XML applications such as parsers and APIs, even relational and Lightweight Directory Access Protocol (LDAP) data sources that are XML-capable. However, what do you get when you use RDF? Why not use XML directly?

As mentioned in previous chapters, RDF provides the same level of functionality to XML as the relational data model adds to commercial database systems. RDF provides a predefined grammar that can be used to consistently record business domain information in such a way that any business domain can have a vocabulary in RDF that can be processed with a host of RDF-based tools and frameworks.

Consider the environmental thesaurus I just mentioned. This is a joint effort between the California Environmental Resource Evaluation System (CERES) and the National Biological Information Infrastructure (NBII). This partnership was formed to create a common environmental vocabulary and the tools necessary to work with this vocabulary. One of the efforts of this project is to document this vocabulary using RDF.

Within the RDF vocabulary, the project has defined a class called Term that has several properties, such as Source, Category, and Status, attached to it. Instead of using RDF, the project could have recorded this information directly within XML; however, if they did this, they then would have to define the concept of “class” and “property” in order to record relationships such as “Source is a property of Term.” In addition, the project would also have to create code to process the XML in such a way that the Source element is processed as a property of Term rather than an arbitrary related element that happens to be nested within the Term element. Lastly, the group would need to create a schema to support these new objects so that the XML document matches the constraints documented in this schema.

For the latter requirement, a Document Type Definition (DTD) file won’t work, as DTDs primarily control nesting and frequency of occurrence of elements; XML Schema won’t work, as it is concerned more with data types and other constraints rather than the metalanguage nature of “class” and “property.” RELAX NG is more easily processed than either of those, but again it is solving different problems.

As you can use XML to serialize the contents of a relational database, you can use XML to serialize the contents of an RDF-based model—but XML isn’t a replacement because XML is nothing more than a syntax. You need a metalanguage vocabulary to be able to use XML to record business domain information in such a way that any business can be documented, and RDF provides this capability.

However, don’t take my word for it; try it yourself in the next several sections when you have a chance to see how a vocabulary is created.

Defining the Vocabulary: Business and Scope

As the Web has matured, more and more of the posted content is aging beyond usefulness. In many cases, this aged content is just deleted from a web site, resulting in “404 Page not found” errors when you click through to the content from some search engine or via a link from another web page. Hitting a missing page is particularly frustrating if you’ve come to the page because of a description associated with it that exactly fits your current interest, and you don’t even know why the page was deleted or if the resource might exist somewhere else.

A further problem with maturing web sites is that site structure doesn’t remain constant—due to the use of new technologies or new directions in content management, resources may be moved around at the site or even moved to new domains. When you access the content, the less-than-helpful sites return with something along the lines of:

404 Not Found 
We're sorry, the file that you requested does not exist or has moved.

Well, which is it? Is the page missing, or was the request invalid because the content’s moved? If you get this message as a result of clicking on a link from another site, is it because the content’s really been deleted or moved, or because the linking site made a mistake with the link? Is the site that owns the content using a new system of cataloging its resources, breaking existing links?

Other sites provide a page with a forwarding message and a link to redirect you to the new content. As important as these redirections are, though, the reasons behind the move may be additional information that can be useful in determining whether the resource is worth pursuing through what could end up being a chain of redirections, with each link in the chain reflecting a different move.

Unfortunately, the reasons for the move aren’t maintained with the redirection in most cases.

Another problem is aging content that isn’t deleted. With this type of page, you could be halfway through reading it only to realize that it talks about a product or technology that’s been obsolete for years. There’s nothing to indicate the relevance of the page, and external factors associated with the page, such as the page title or label, may not provide enough context to determine whether the resource is useful for your purposes or not.

Tip

Netscape’s support of Dynamic HTML (DHTML) for the company’s browser is a classic case of content being under one label—DHTML—with two drastically different implementations based on browser version. DHTML for Version 4.x of Netscape won’t work with the current Netscape 6.x products and vice versa. The only way to determine whether a page titled “Working with DHTML in Netscape” is useful for your purposes is to read it and hope you know enough about the subject to know whether you’re wasting your time.

Content management systems such as FrontPage, Vignette, and others help with creating, posting, and managing the original content, but do not help provide information about the context of the resource. meta tags can be attached to each HTML resource providing copyright information, keywords, or authorship, but nothing regarding the expected life expectancy of the resource or its move history, including reasons for the move, unless you put this information into the description — an approach that isn’t standardized and therefore not useful.

These systems are as helpless as web browsers at determining whether a 404 error occurred because of a typo, a relocation, or a resource no longer being maintained at the site.

What’s needed is a content system that takes over after the content management systems have finished their task of posting the content: a postcontent information system that can be accessed by a runtime application and provide information about the resource to the resource consumers. Such a system must provide information that is useful for humans and is also usable by automated processes.

We’ll use this type of system to demonstrate how to create an RDF vocabulary and, eventually, how to use the vocabulary just created. For simplicity in this chapter (and later in the book), I’ll refer to this system as PostCon.

Defining the Vocabulary: Elements

How to start defining the vocabulary for this type of system? Compatible with most application efforts, the first step to creating the vocabulary is to define the business domain elements and their properties of interest within the given business scope.

The PostCon Domain Elements

Defining the business elements for a new system is the same process whether the domain is being defined for use within a more traditional relational database or within a system with data defined and managed through RDF-capable processes. Following from existing data modeling techniques, you first describe the major entities and their properties, then describe how these entities are related to one another.

PostCon has one major or root element, the web site resource; the system is interested in this resource from six different perspectives:

  • What is the content’s bio—who wrote it, who owns it, when was it created, and what are its subject and topic?

  • What is the content’s relevancy—has it been updated for new circumstances and does it have a date beyond which it is no longer pertinent?

  • What is the content’s history of movement—has it been deleted? If so, why? Has it moved? If so, why, and where is it now?

  • What are the content’s related resources—has it been replaced? Are other resources related to it? Are other resources dependent on it, or is it dependent on other resources?

  • If the resource no longer exists, are there replacements? Why are they replacements?

  • What are the presentation characteristics of the content? Its type? Does it conform to any standard? Does it require specialized user agents? Are there any dependencies?

The set of PostCon objects consists of a web resource, its bio, a movement associated with the resource, presentation and type information, and other related resources. Each object is then described by a set of properties. Many of these are compatible with HTML meta tag elements such as Title and Content and should be synchronized with the values included within the HTML; others are unique to the system.

The main system elements are then described by a set of properties, as defined in Table 6-1.

Table 6-1. PostCon system domain elements and their properties

Element

Property

Description

Content

Unique Content ID

To identify content

 

Biography

Content biographical information

 

Relevancy

Relevancy of content

 

History

History of content movement

 

Related

Related content

 

Presentation

Content type and presentation

Content bio

Title

Resource’s title

 

Resource Abstract

Excerpt from resource if applicable

 

Resource Description

Description of Resource

 

Creation Date

Date resource was first created

 

Content Author

Person or organization responsible for creating content

 

Content Owner

Person or organization who owns copyright on content

Relevancy

Content Status

Current status of content

 

Subject

Subject/topic of resource (may duplicate)

 

Relevancy Expiration

Date when content is aged beyond usefulness

 

References

External resources referenced in content

 

Referenced by

External resources that reference content

History

Movement

Location at end of movement

 

Reason

Reason for movement

 

Date

Date of movement

 

Type

Type of movement

Related

Related Resource

Resource URI

 

Reason

Reason for relationship

Recommendation

Recommended Resource

URI of recommended replacement

 

Title

Title of replacement

 

Reason

Reason for recommendation

Presentation

Format

Format of resource

 

Conformity

Standards/specifications resource conforms to (may repeat)

 

Requires

Resource dependencies (may repeat)—may have associated type of requirement as well as required resource (may repeat)

The Unique Resource ID (URI) is defined once for the content and follows it regardless of the content’s current location. The Resource Title property is equivalent to the HTML Title element, and the Resource Description is equivalent to the Description meta tag, which contains a short abstract of the resource’s contents:

<meta name="description" content="Dynamic Earth site focuses on 
science and the world and universe around us. You can never know too much">

The material within the content attribute is used for the Resource Description content. The Content Author is equivalent to the Author meta tag, and the Content Owner is equivalent to the Copyright meta tag:

<meta name="author" content="Shelley Powers">
<meta name="copyright" content="&copy; 1997-2003 Burningbird">

The Content Status for the web resource contains information about the current status of the document, such as whether it has been deleted or is still active. The Relevancy Expiration is a date when the content author expects the resource contents to become dated and no longer viable. The Requires property also provides information about the viability of the content, such as being dependent on Version 1.0 of a specific product release.

The History of the resource tracks its movement throughout the network, as well as the date and reason for the move. This is particularly useful when providing information about deleted content. The Related material provides information about replacement URLs for content that is no longer viable, and the Recommendation material covers additional recommended material complementary to the material, while the Presentation reflects information necessary to “consume” the resource, as it were.

For a specific web resource, there is one Resource bio, Relevancy, History, and Presentation sections, but many related items. Additionally, within the History section there can be many movements. This and the domain information are then used to prototype the RDF vocabulary, as described next.

Prototyping the Vocabulary

Before creating a formal RDFS document for the new vocabulary, you should prototype the model with several different instances of it, to ensure that the results corroborate the expected outcome. During this process, check the validity of your data with the RDF Validator, which validates the result against the standard and also provides an edged graph and N-Triples breakdown of the RDF.

Tip

You can access the RDF Validator at http://www.w3.org/RDF/Validator/.

As a test case for the PostCon vocabulary, information about the giant squid articles introduced in Chapter 2 through Chapter 4 is recorded using the domain elements from the last section. The articles are particularly useful as test cases because they have been moved about, are related to each other, reference, and are referenced by external resources. About the only thing that the articles don’t demonstrate is when a web resource has been deleted, and we’ll test this out with another document later.

When creating a new vocabulary, the first thing to do is define the URI for the vocabulary namespace. By convention, this should be the URL of the RDFS document when it is eventually made. In the case of PostCon, I used the following URL for the namespace:

http://burningbird.net/postcon/elements/1.0/

This is actually fairly descriptive—this is the location of the set of PostCon Version 1.0 vocabulary elements. When the RDFS document for the vocabulary is finished, it will be dropped into this location primarily for use by utilities that make use of it for RDF/XML exploration (covered in Chapter 7).

Tip

There is no requirement as to the structure of the URI for a namespace, nor does the RDFS document have to exist—but it is good practice to use a consistent namespace and to create the document and place it in the URL of the namespace.

Next up is determining what the URI of the web resource is. We could actually create an identifier for our resources, but my preference for the PostCon system is just to use it as the identifier the URL of the resource when it was first defined within the PostCon RDF/XML vocabulary. What’s important is that it be consistent and unique—any other requirements are purely system dependent, not RDF/XML dependent.

I used the first document in the article series as the test case, and since it was located within the domain http://burningbird.net and within the articles subdirectory, its URI became:

http://burningbird.net/articles/monsters1.htm

However, to simplify the model, xml:base (explained in Chapter 3) is used and set to a value of http://burningbird.net/articles, and the resource URI is set to monsters1.htm.

The other top-level predicates are added sans their predicates to give a relatively flat model at this point. Example 6-1 shows the RDF/XML at this stage.

Example 6-1. First cut of PostCon vocabulary, with scalar values
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <rdf:Description rdf:about="monsters1.htm">
     <pstcn:bio />
     <pstcn:relevancy />
     <pstcn:presentation />
     <pstcn:history />
     <pstcn:related />
  </rdf:Description>

</rdf:RDF>

Next, we’ll start adding the other predicates to the model, but first, there’s one change we want to make to the model. As it is currently defined, we have the resource, but we don’t necessarily know what it is. It is a web resource, but by the model’s definition it could be any other resource that can be defined by an arbitrary URI, including a person, a place, or a thing. To refine the model, then, we’ll add an rdf:type predicate to it, with a value of http://burningbird.net/postcon/elements/1.0/Resource. However, to make the model as simple as possible, we’ll use an RDF/XML shortcut (detailed in Section 3.5) and replace the rdf:Description block with a reference to this new class:

<pstcn:Resource>
     <pstcn:bio />
     <pstcn:relevancy />
     <pstcn:presentation />
     <pstcn:history />
     <pstcn:related />
</pstcn:Resource>

The directed graph that results from this change, as shown in Figure 6-1, is no different than if we had used the more formal rdf:Description block with the associated rdf:type predicate.

The graph of our PostCon example
Figure 6-1. The graph of our PostCon example

Next we’ll start adding the predicates, beginning with pstcn:bio. Since RDF/XML requires a striped syntax of node-arc-node-arc, and rdf:bio is acting as an arc, rdf:bio’s contents must be redefined as a blank node—a resource without a URI. Adding an rdf:Description block to rdf:bio and then adding its predicates as shown in Example 6-2 accomplishes redefining rdf:bio as a blank node. The predicates are named the same as the attributes defined in Table 6-1, but converted to QNames per the RDF/XML requirement. Changes to the RDF/XML are boldfaced.

Example 6-2. Adding in the pstcn:bio predicates
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters1.htm">
     <pstcn:bio>
        <rdf:Description>
           <pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
     	<pstcn:abstract>
            When I think of "monsters" I think of the creatures of 
            legends and tales, from the books and movies, and 
            I think of the creatures that have entertained me for years.
     	</pstcn:abstract>
           <pstcn:description>
            Part 1 of four-part series on cryptozoology, legends, 
            Nessie the Loch Ness Monster and the giant squid.
           </pstcn:description>
     	<pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated>
     	<pstcn:author>Shelley Powers</pstcn:author>
     	<pstcn:owner>Burningbird Network</pstcn:owner>
        </rdf:Description>
     </pstcn:bio>   
     <pstcn:relevancy />
     <pstcn:presentation />
     <pstcn:history />
     <pstcn:related />
  </pstcn:Resource>

</rdf:RDF>

The rdf:bio resource isn’t given a URI because one doesn’t exist for it. The resulting graph shows a computer-generated blank node identifier assigned to the resource.

Again, in the interests of simplifying the model as much as possible, another RDF/XML shortcut is applied to the model. In this case, the attribute rdf:parseType is added to the pstcn:bio element, and its value is set to "Resource". Doing this, we can eliminate the rdf:Description block:

<pstcn:bio rdf:parseType="Resource">
   <pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
   <pstcn:abstract>
     When I think of "monsters" I think of the creatures of 
     legends and tales, from the books and movies, and 
     I think of the creatures that have entertained me for years.
   </pstcn:abstract>
   <pstcn:description>
    Part 1 of four-part series on cryptozoology, legends, 
    Nessie the Loch Ness Monster and the giant squid.
   </pstcn:description>
   <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated>
   <pstcn:author>Shelley Powers</pstcn:author>
   <pstcn:owner>Burningbird Network</pstcn:owner>
</pstcn:bio>

Though simplified with this syntactic change, the resulting directed graph of the model at this point, as shown in Figure 6-2, is equivalent to the longer, more formal syntax.

RDF directed graph of model defined in Example 6-2
Figure 6-2. RDF directed graph of model defined in Example 6-2

Tip

Though the resulting XML is simpler when using one of the established shortcuts, it doesn’t necessarily reflect either the N-Triples or the directed graph of the model. This could be confusing for people new to RDF/XML. When documenting your model, you’ll most likely want to start with the more formal RDF/XML syntax and then demonstrate the vocabulary with instances that use the shortcuts.

In Figure 6-2, I show the bio properties grouped via a blank node. Coming from a relational database background, my first inclination is to group related properties into a resource and link this back to the primary resource, rather than “flatten” the model and include each property as a direct attribute of the original resource. I follow this approach with RDF, primarily because, in my opinion, it leads to cleaner RDF processing—whether that processing occurs manually or through automation.

If I had listed each of the “grouped” properties directly with the resource, there’s no breakdown for relevancy or for the resource’s bio. If a specific process was interested only in the biographical elements, each bio-related attribute would then have to be defined as biographically related to highlight it from the other properties. Now, if the bio-related properties were defined within one specific RDF “entity” (resource), it’s a simple matter to process only bio properties just by processing all elements within the designated bio resource. Whether you’re generating RDF through an API, consuming it with an RDF parser, or visually looking at an RDF document, grouping the properties through derived resources makes sense.

The other groupings of attributes, such as relevancy and presentation, are completed in the same manner as bio and I won’t cover all that here. However, the Related predicate is handled differently and is therefore covered in the next section.

Tip

The PostCon vocabulary is used as a test case in all the examples for the rest of the book.

Adding Repeating Values

Not all recorded values occur as single properties within the PostCon vocabulary—a web resource can move many times, and there can be more than one recommended resource to replace an outdated item. The vocabulary must be able to handle repeating properties. Within the RDF specification, you can use the same predicate in multiple statements, such as the following:

<pstcn:related rdf:resource="monsters2.htm" />
<pstcn:related rdf:resource="monsters3.htm" />
<pstcn:related rdf:resource="monsters4.htm" />

The distinguishing aspect of these statements then becomes the object, the predicate value. Attached to the primary resource, this syntax states that there are three related resources for the entity being defined. It also states that there’s no order to the resources, and the only point of connectivity between the resources is that they’re related, in some way, to the original entity. There is neither an implicit nor an explicit grouping between the items.

At this point, the RDF/XML just shows the three related resources, and the resulting directed graph would show these items with ovals drawn around the objects as well as the resource. However, if I wanted to include additional information about the relationship between the related resources and the resource being defined in the document, I could do so in a couple of ways.

First, I can define the related resource using the rdf:parseType="Resource" setting as I did with pstcn:bio. The problem with this is that each of the related resources actually does have a URI, and using rdf:parseType, I’d lose this information. Instead, what I’ll use is the rdf:resource attribute. This allows me to specify the URI for the resource.

Since these resources are related but separate from the main resource, I tend to want my model to reflect this, so I’ll define the related resources as separate resources, related only through the URI. Example 6-3 shows the RDF/XML for the PostCon instance with the three related resources, each of them defined using the pstcn:Resource class, and each including the related resource attributes of title and reason.

Example 6-3. Adding in related PostCon resources
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters1.htm">

     <pstcn:bio rdf:parseType="Resource">
        <pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
        <pstcn:abstract>
         When I think of "monsters" I think of the creatures of 
         legends and tales, from the books and movies, and 
         I think of the creatures that have entertained me for years.
        </pstcn:abstract>
        <pstcn:description>
         Part 1 of four-part series on cryptozoology, legends, 
         Nessie the Loch Ness Monster and the giant squid.
        </pstcn:description>
        <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated>
        <pstcn:author>Shelley Powers</pstcn:author>
        <pstcn:owner>Burningbird Network</pstcn:owner>
     </pstcn:bio>   

     <pstcn:related rdf:resource="monsters2.htm" />
     <pstcn:related rdf:resource="monsters3.htm" />
     <pstcn:related rdf:resource="monsters4.htm" />

  </pstcn:Resource>

  <pstcn:Resource rdf:about="monsters2.htm">
     <pstcn:title>Cryptozooloy</pstcn:title>
     <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters3.htm">
     <pstcn:title>A Tale of Two Monsters: Architeuthis Dux </pstcn:title>
     <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters4.htm">
     <pstcn:title>Nessie, the Loch Ness Monster </pstcn:title>
     <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>

</rdf:RDF>

Since the predicates associated with each related resource are simple and nonrepeating, I’m going to apply another shortcut to simplify the model—simple nonrepeating predicates can be listed as attributes on the resource:

  <pstcn:Resource rdf:about="monsters2.htm" 
         pstcn:title="Cryptozooloy"
         pstcn:reason="First in the Tale of Two Monsters series." />
  <pstcn:Resource rdf:about="monsters3.htm"
         pstcn:title="A Tale of Two Monsters: Architeuthis Dux"
         pstcn:reason="Second in the Tale of Two Monsters series." />
  <pstcn:Resource rdf:about="monsters4.htm"
         pstcn:title="Nessie, the Loch Ness Monster"
         pstcn:reason="Fourth in the Tale of Two Monsters series." />

The resulting RDF/XML and directed graph are the same. The only difference this change makes is to make the XML simpler and a little easier to read. It’s also more comfortable for people familiar with XML, though, as stated earlier, it does tend to obscure the RDF constructs.

Another reason to use this shortcut is that, if I preferred not to list the resources separately, I could list them as is with the predicates redefined as attributes, directly back into main resource. You couldn’t do this using the rdf:resource attribute because you couldn’t add formalized predicates to the block without generating errors. You would have to use the more formal node-arc-node by defining the predicate (pstcn:related), which would contain the rdf:Description block, which would then contain the related predicates:

<pstcn:related>
   <rdf:Description rdf:about="monsters3.htm"
         pstcn:title="A Tale of Two Monsters: Architeuthis Dux"
         pstcn:reason="Second in the Tale of Two Monsters series." />
</pstcn:related>

However, you can add predicates to the related resources that have been defined through the use of pstcn:Resource, by using the predicates as attributes shortcut, as demonstrated in Example 6-4.

Example 6-4. Embedding related resources directly in main resource
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters1.htm">

     <pstcn:bio rdf:parseType="Resource">
        <pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
     <pstcn:abstract>
         When I think of "monsters" I think of the creatures of 
         legends and tales, from the books and movies, and 
         I think of the creatures that have entertained me for years.
     </pstcn:abstract>
        <pstcn:description>
         Part 1 of four-part series on cryptozoology, legends, 
         Nessie the Loch Ness Monster and the giant squid.
        </pstcn:description>
     <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated>
     <pstcn:author>Shelley Powers</pstcn:author>
     <pstcn:owner>Burningbird Network</pstcn:owner>
     </pstcn:bio>   

     <pstcn:Resource rdf:resource="monsters2.htm" 
         pstcn:title="Cryptozooloy"
         pstcn:reason="First in the Tale of Two Monsters series." />
     <pstcn:Resource rdf:resource="monsters3.htm"
         pstcn:title="A Tale of Two Monsters: Architeuthis Dux"
         pstcn:reason="Second in the Tale of Two Monsters series." />
     <pstcn:Resource rdf:resource="monsters4.htm"
         pstcn:title="Nessie, the Loch Ness Monster"
         pstcn:reason="Fourth in the Tale of Two Monsters series." />

  </pstcn:Resource>
</rdf:RDF>

In some ways, this demonstrates that you either commit to using formal syntax all the way, or you commit to using abbreviated (shortcut) syntax all the way—at least for one complete RDF construct, such as the related items. Since my reasons for wanting to list the related resources separately remain, even though the RDF/XML and resulting directed graph are identical, I’ll continue to use the approach demonstrated in Example 6-3.

If I want to show that predicates are related to one another in some way beyond just being related to the defined entity, I’ll use a container to group the items and then attach that container to the entity. The next section describes how.

Adding a Container

The PostCon vocabulary considers movements of the web resource related to one another. The first movement occurs when the resource is added to the web site; the second and each additional movement are related to one another by the date and time of the movement. Infinite numbers of movements are possible.

To group like items that are related to one another as well as to the main resource, I could use either an RDF Container or a Collection. Both provide the grouping-of-related-items semantics that I need, but the relationship and number of items within the grouping differ based on which construct I use. And that’s how I’ll determine which to use.

As described in Chapter 4, a Container is a group of related items that has no nth point—in other words, it could possibly contain an infinite number of items. A Collection, on the other hand, always has an endpoint, the implicit rdf:nil. Use of Collection creates the assumption that the grouping is of a finite number of objects.

Additional tool-based semantics are associated with containers and collections—such as sequence with rdf:Seq and so on—but these aren’t enforced within the RDF data model/graph, so I won’t depend on them to make my decision about what to use. Instead, I’ll rely on the one factor that is semantically defined in the RDF graph: whether the number of items in the group is infinite. Since I determined that a web resource can have infinite movements, I will choose an RDF Container.

I now face additional choices, such as which container type to use. There is no enforcement of the Container differences within RDF, but there is a general assumption about behavior attached to each, so I’ll want to pick the RDF Container type (Seq, Bag, or Alt) that fits my vocabulary model.

Since each movement is unique, the Bag type isn’t a good fit because an implicit assumption associated with it is that items can be duplicated. Nor is the Alt type a good fit, because it implicitly represents items that are alternatives to each other. The best fit is Seq, which has implicit associated semantics of related items in a sequence, from first to last. This fits history particularly well.

Each movement has its own URI representing the movement itself, so each one can be identified distinctly. Because of this, my preference is, again, to list these out separately, related to the main resource through the container. Example 6-5 shows the PostCon vocabulary after adding in the Seq container. Note that I created a new class for the movement, pstcn:Movement. I couldn’t use pstcn:Resource, because the movements really aren’t resources. I could have also left the resources defined in generic rdf:Description blocks, but I prefer to embed as much information into the model as possible, and defining the new class—Movement—provides a type to go with each movement definition, independent of the relationship defined by history earlier in the main resource.

Example 6-5. PostCon vocabulary instance showing Movement and related resources
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters1.htm">

<!--biography of resource-->
     <pstcn:bio rdf:parseType="Resource">
        <pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
       <pstcn:abstract>
         When I think of "monsters" I think of the creatures of 
         legends and tales, from the books and movies, and 
         I think of the creatures that have entertained me for years.
       </pstcn:abstract>
        <pstcn:description>
         Part 1 of four-part series on cryptozoology, legends, 
         Nessie the Loch Ness Monster and the giant squid.
        </pstcn:description>
        <pstcn:dateCreated>1999-08-01T00:00:00-06:00</pstcn:dateCreated>
        <pstcn:author>Shelley Powers</pstcn:author>
        <pstcn:owner>Burningbird Network</pstcn:owner>
     </pstcn:bio>   

<!--related resources-->
     <pstcn:related rdf:resource="monsters2.htm" />
     <pstcn:related rdf:resource="monsters3.htm" />
     <pstcn:related rdf:resource="monsters4.htm" />

<!--resource movements-->
     <pstcn:history>
       <rdf:Seq>
        <rdf:_1 rdf:resource="http://www.yasd.com/dynaearth/monsters1.htm" />
        <rdf:_2 rdf:resource="http://www.dynamicearth.com/articles/monsters1.htm" />
        <rdf:_3 rdf:resource="http://burningbird.net/articles/monsters1.htm" />
      </rdf:Seq>    
     </pstcn:history>

  </pstcn:Resource>

<!--related resource defintions-->
  <pstcn:Resource rdf:about="monsters2.htm">
     <pstcn:title>Cryptozooloy</pstcn:title>
     <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters3.htm">
     <pstcn:title>A Tale of Two Monsters: Architeuthis Dux (Giant Squid)</pstcn:title>
     <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters4.htm">
     <pstcn:title>Nessie, the Loch Ness Monster </pstcn:title>
     <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>

<!--resource movement definitions-->
  <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters1.htm">
      <pstcn:movementType>Add</pstcn:movementType>
      <pstcn:reason>New Article</pstcn:reason>
      <pstcn:date>1998-01-01T00:00:00-05:00</pstcn:date>
  </pstcn:Movement>
  <pstcn:Movement rdf:about="http://www.dynamicearth.com/articles/monsters1.htm">
      <pstcn:movementType>Move</pstcn:movementType>
      <pstcn:reason>moved to dynamicearth.com domain</pstcn:reason>
      <pstcn:date>1999-10-31:T00:00:00-05:00</pstcn:date>
  </pstcn:Movement>
  <pstcn:Movement rdf:about="http://burningbird.net/articles/monsters1.htm">
     <pstcn:movementType>Move</pstcn:movementType>
     <pstcn:reason>Moved to burningbird.net</pstcn:reason>
     <pstcn:date>2002-11-01:T00:00:00-05:00</pstcn:date> 
  </pstcn:Movement>

</rdf:RDF>

There is also something intriguing in this RDF/XML example—the actual resource is defined both as the document Resource and as a Movement (in fact, the last movement for the history since the resource was defined in the PostCon system before any additional movements were made). This is perfectly legitimate and results in an interesting directed graph of a resource that has an arc pointing back to itself, as demonstrated in Figure 6-3.

A resource containing a predicate whose value is the same URI as the original resource
Figure 6-3. A resource containing a predicate whose value is the same URI as the original resource

Also notice in the figure that the original resource now has two type properties associated with it: one for Resource and one for Movement. Again, this is perfectly legitimate RDF. In fact, the more knowledge we can put into the model, and the simpler the syntax, the better.

Adding in a Value

The example RDF/XML demonstrated to this point has focused on bio, history, and related resources. The other PostCon classes—Relevancy and Presentation—are treated the same as bio, except for one new construct: the Presentation’s Required property. Unlike other properties defined in the document up to this point, Requires is neither a straight resource property nor is it a literal—it’s a value that has an associated type that determines how the value is treated. The ideal RDF/XML construct to use to represent this is rdf:value.

Without replicating all of the Relevancy properties, the following RDF/XML demonstrates how rdf:value would work for pstcn:requires. The pstcn:requires property is defined with an rdf:parseType of "Resource", and has two attributes: pstcn:type, which specifies the type of required resource, and rdf:value, which signals the actual value. Two resources are required:

<pstcn:presentation rdf:parseType="Resource">
   <pstcn:requires rdf:parseType="Resource">
      <pstcn:type>stylesheet</pstcn:type>
         <rdf:value>http://burningbird.net/de.css</rdf:value>
   </pstcn:requires>
   <pstcn:requires rdf:parseType="Resource">
      <pstcn:type>logo</pstcn:type>
      <rdf:value>http://burningbird.net/mm/dynamicearth.jpg</rdf:value>
   </pstcn:requires>
</pstcn:presentation>

The intended semantics for rdf:value are that it always references the actual value of the predicate—anything else is just definitive information about how that predicate is treated.

The rest of the vocabulary uses the same constructs as have been used to this point and is omitted for brevity. A complete example of the vocabulary is given later, after a few modifications are made to merge the vocabulary with the Dublin Core. In the meantime, though, testing the vocabulary demonstrated to this point with other web site test cases shows that it tests out with all the business domain data. At this point, we can be comfortable that the vocabulary matches the system needs. The next step is to formalize the vocabulary schema using RDF Schema.

Formalizing the Vocabulary with RDFS

Formally defined RDFS schemas aren’t required for all RDF documents, but the schema approach guarantees that a particular RDF document is semantically and syntactically consistent across implementations.

RDFS defines which vocabulary elements are classes and which are properties. In addition, RDFS also matches a property with a specific element, as well as defining the range for each property. This is particularly helpful when defining properties that contain a range of elements, such as the pstcn:movementType property in the last section. RDFS also documents the type of literal that each property can reference—whether the property value is a string or a number, such as an integer.

What Is a Class and What Is a Property?

Determining what is a class and what is a property within the vocabulary is an interesting RDF Schema challenge. Your first reaction might be that an RDFS Class is equivalent to a relational data model Entity, but that doesn’t hold.

In actuality, an RDFS Class is any item that can be used in place of an rdf:Description block, with an associated rdf:type, such as Movement or Resource. An RDFS Class is not a resource property, like bio, Presentation, or Relevancy.

Tip

A quick test to double-check your use of RDFS Class versus RDFS Property for an item is to use ICS-FORTH’s Validating RDF Parser (VRP), asking for a graph output on the test RDF/XML document. This tool actually identifies which elements it views as classes and which as properties. This tool is covered in Chapter 7.

Defining the Vocabulary Classes

To start, define the PostCon vocabulary classes. Table 6-1 shows that the classes mark the main objects defined within the table, as you would expect. Using RDFS, then, the main object of the vocabulary, Resource, is defined with the following RDF/XML syntax:

<rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Resource">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
 <rdfs:label xml:lang="en"> Web Resource</rdfs:label>
 <rdfs:comment xml:lang="en">
    Web resource managed with PostCon System
 </rdfs:comment>
</rdfs:Class>

This RDF/XML defines Resource to be an RDF Class, defined within the schema http://burningbird.net/postcon/elements/1.0/, which is a subclass of the RDF Resource type. Its human-readable label is Web Resource, and the comments provide a brief description of the item. Both label and comments have an xml:lang attribute defining the language. If you’re providing multilingual support for your elements, repeat the label and comments but change the xml:lang attribute value.

Tip

Though things such as label and comments aren’t necessary for the schema, you should always include these. BrownSauce, a Java-based RDF browser (described in Chapter 7), provides this information to people browsing RDF/XML documents.

This class by itself demonstrates the need for namespaces within RDF/XML; the RDF vocabulary also has a Resource class. The same type of RDFS/XML is also applied to bio, Movement, Relevancy, and Presentation, all of which are defined as classes. All other elements are defined as properties.

Defining the Properties

Each property within the vocabulary is defined, including providing data type information, human-readable comments and labels, and a definition of the relationship between properties and classes. The latter is particularly important because it provides usage guidelines as well as understanding of the schema.

An example of a property definition for PostCon is the following, for type:

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/type">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Type</rdfs:label>
 <rdfs:comment>Type of Required Resource</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

The type element has a range that determines the type of values associated with it. In this case, the range is literal, meaning the element will contain literal values. In addition, there are two domains associated with the title that show the classes the property is associated with: bio and Movement.

The other properties are defined using almost the same schema, changing the label, comments, and domain as appropriate; the two properties history and related are different from the other properties, though, because they don’t describe a literal. For instance, here is the definition for the related property:

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/related">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en"> Related Resource</rdfs:label>
 <rdfs:comment xml:lang="en">
    Resources within PostCon system related to current resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

The predicate object associated with related is a resource of class Movement. Other than that, though, the definition is fairly close to how all the properties are defined.

The complete schema is shown in the next section. Note that testing the schema within the RDF Validator does prove that the RDF Schema is valid RDF/XML. The resultant RDF graph is a bit hard to read, though—all those references to the same RDFS classes.

Certain of the properties in the schema have an “allowable values are...” within them. There is currently no way to constrain allowable literals within the RDF Schema. However, since the schema is used more for human rather than machine interpretation, including this information within the comment is useful.

A vocabulary schema defines vocabulary elements and their relationship with one another and with the RDF and RDFS elements. For instance, since the PostCon schema document is a resource, using the PostCon vocabulary elements within the document to detail its creation is perfectly acceptable.

This approach is used within another widely used vocabulary, the Dublin Core (DC), which we will look at next and compare to the PostCon vocabulary. We’ll also find that we can modify PostCon to make use of DC elements, simplifying it.

Integrating the Dublin Core

According to the mission statement, located at http://www.dublincore.org/:

The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI’s activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

The Dublin Core’s primary purpose is to discover a metadata model that can be used to describe resources intelligently so that this information can be used in more efficient and intelligent resource searches, knowledge systems, and so on.

At first, this description of Dublin Core may position it as a competitive specification to RDF, but in reality, they’re highly compatible. Dublin Core is an effort to define the business data of the Web, so to speak. RDF, on the other hand, is a way of recording this metadata so that it can be merged with other metadata defined for other businesses, not just the business of the Web. In other words, RDF is the methodology, and Dublin Core is one business employing the RDF methodology.

Since Dublin Core is an effort to define business data, serializing that data need not be done with RDF. The Dublin Core project provides an RDF/XML version of the data that it has defined, true. But it also provides one in simple, basic XML and one in HTML. However, it is the RDF/XML version we’re interested in and will focus on at this time.

An Overview of the Dublic Core MetaData Element Set

The Dublin Core MetaData Element set (Version 1.1, found at http://www.dublincore.org/documents/1999/07/02/dces/. consists of a core set of elements that comprise what is known as simple Dublin Core. These elements are:

title

A name given to the resource

creator

An entity responsible for making the content of the resource

subject

The topic of the content of the resource

description

An account of the content of the resource

public

An entity responsible for making the content available

contributor

An entity responsible for making contributions to the content of the resource

date

A date associated with an event in the life cycle of the resource

type

The nature or genre of the content of the resource

format

The physical or digital manifestation of the resource

identifier

An unambiguous reference to the resource within a given context

source

A reference to the resource from which the present resource is derived

language

A language of the intellectual content of the resource

relation

A reference to a related resource

coverage

The extent or scope of the content of the resource

rights

Information about rights held in and over the resource

Associated with the different entities is additional information, such as Language being derived from the two-character language code derived from the ISO 639 document (such as “EN” for English) and a date format for date (YYYY-MM-DD).

As you can see immediately, several DC elements could be used in place of PostCon elements. First, though, let’s take a look at Dublin Core implemented as RDF/XML.

Dublin Core in RDF/XML

The Dublin Core vocabulary is one of the simplest, which is probably one reason it’s so heavily used. The namespace for the elements is at:

http://purl.org/dc/elements/1.1/

If you go to this URL with your browser, you’ll see an actual document, with a schema description for each element. The prefix usually given for the Dublin Core namespace within an RDF document is dc, which we’ll use in this chapter.

I won’t include the document here, nor will I discuss each element. However, some elements are of particular interest because they seem to map to a PostCon element. And if there’s a way of reducing PostCon, we’ll want to pursue it.

For instance, one element from PostCon that definitely looks to be in DC is title. The Dublin Core title is defined to be “a name given to the resource.” Since our definition of title in PostCon is “resource’s title,” we have a match. Looking at the schema definition for the property we find:

<rdf:Property rdf:about="http://purl.org/dc/elements/1.1/title">
 <rdfs:labelxml:lang="en-US">Title</rdfs:label>
 <rdfs:commentxml:lang="en-US">A name given to the resource.</rdfs:comment> 
 <dc:descriptionxml:lang="en-US">Typically, a Title will be a name by which the  
resource is formally known.</dc:description>
  <rdfs:isDefinedByrdf:resource="http://purl.org/dc/elements/1.1/" />
  <dcterms:issued>1999-07-02</dcterms:issued>
 </rdf:Property>

There are some differences between this and the original PostCon title schema definition. For instance, the schema for the PostCon title listed the property’s domains (that is, acceptable contexts for the property) to be the pstcn:Resource class (and indirectly to Movement, which is a subclass of pstcn:Resource). The DC doesn’t list domains because it doesn’t seek to limit what classes it can be used for, opening the door for us to use the property in PostCon.

Another difference is that DC is used directly to describe the property. Again, this won’t adversely impact the use of title in PostCon. In fact, the additional information is helpful. Finally, there is another property assigned to a different namespace: dcterms:issued. Before we can determine whether this property will limit our use of title in PostCon, we’ll have to take a closer look at this new schema.

Tip

For more on Dublin Core in RDF/XML, see the pending recommendation “Expressing Simple Dublin Core in RDF/XML,” authored by Dave Beckett, Eric Miller, and Dan Brickley, and found at http://www.dublincore.org/documents/2001/11/28/dcmes-xml/.

Qualified Dublin Core

All of the Dublin Core metadata elements are properties within the context of RDF. Within an RDF graph, that means that all of them radiate out from a single resource. Again, this makes the vocabulary attractive to use because it is so simple and uncomplicated. However, there are basic limitations to how broadly one can stretch any one element to meet a specific use. And by stretching meanings at all, we lose some refinement.

Sure, we can group all dates together, but do we want to?

So, the Dublin Core Working Group set out to define a set of qualifiers that limit or modify the meaning of the DC elements. Additionally, the group determined that the qualifiers belonged in one of two different categories: qualifiers for element refinement and qualifiers for encoding schema.

Element refinement qualifiers restrict the scope of the element. For instance, there is the general concept of date and then there is creation date (from PostCon), modified date, and so on. Those vocabularies that want such refinements can use things such as modified date and creation date. However, vocabularies (or applications) that don’t care about the refinement can ignore it and just treat the qualified elements as date.

Element refinement qualifiers are based on the business of the schema rather than its implementation. Encoding schema qualifiers, though, exist purely to help with parsing and interpretation of the data. Again, date can have many interpretations as to what type of date is being recorded. By using encoding schema qualifiers, there’s no confusion about what to expect for data within a specific date field.

When looking at Dublin Core, we can see uses for several of the elements, but when we look at the qualified Dublin Core implemented in RDF/XML, we find a strong match for several PostCon classes and properties.

First, the namespace for the qualified Dublin Core Schema is at http://purl.org/dc/terms/. The namespace prefix for the qualified Dublin Core is usually dcterms.

The first property that attracts attention is created, a qualifier on the date property. The created definition is:

<rdf:Property rdf:about="http://purl.org/dc/terms/created">
  <rdfs:label>Created</rdfs:label>
  <rdfs:comment>Date of creation of the resource.</rdfs:comment>
  <rdfs:subPropertyOf rdf:resource = "http://purl.org/dc/elements/1.1/date" />
  <rdfs:isDefinedBy rdf:resource="http://purl.org/dc/terms/" />
</rdf:Property>

The thing to focus on is the comment Date of creation of the resource. This exactly matches the description for the pstcn:creationDate property in PostCon. In the last section, we weren’t sure how to handle the dcterms:issued, but now we know it’s nothing more than an issued date, a further qualification of the specification for the title property.

Another set of properties that seemed similar to PostCon elements is the DC Relation property and the qualified replacers: dcterms:isReplacedBy, dcterms:seeAlso, dcterms:references, and so on. They’re not used to replace PostCon’s related property (and associated Resource class) though because the DC properties have built-in semantics that don’t encompass all of PostCon’s related property semantics. However, PostCon’s pstcn:dependencies and DC’s qualifier dcterms:requires seem to be a good match.

After the first glance, both the original Dublin Core elements and the qualified element set seem to have good replacements, or additions, to the PostCon vocabulary. And since both are defined within RDF, it will be simple to use them together in RDF/XML documents.

Mixing Vocabularies

After the first glance at the Dublin Core simple elements, I decided to replace the PostCon attributes demonstrated in this chapter with matching DC elements. These include the following replacements:

pstcn:title

dc:title

pstcn:author

dc:creator

pstcn:owner

dc:publisher

pstcn:abstract

dcterms:abstract

pstcn:description

dc:description

pstcn:creationDate

dc:created

pstcn:date

dc:date

I also decided to add the format property, to provide the resource file type. Small changes, but they do reduce the size of the PostCon vocabulary, as well as allowing easier data sharing on these items.

To see how these two vocabularies work together, the RDF/XML for the sample monsters1.htm resource is provided in Example 6-6. The Dublin Core Schema namespaces are added to the top-level RDF element, and the dc and dcterms properties are used in place of the now-removed PostCon properties. In addition, both Relevancy and the Presentation resources have been added to complete the document.

Example 6-6. Mixing PostCon and DC vocabulary elements
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters1.htm">

<!--Resource biographical information-->
     <pstcn:bio rdf:parseType="Resource">
        <dc:title>Tale of Two Monsters: Legends</dc:title>
        <dcterms:abstract>
            When I think of "monsters" I think of the creatures of 
            legends and tales, from the books and movies, and 
            I think of the creatures that have entertained me for years.
        </dcterms:abstract>
        <dc:description>
            Part 1 of four-part series on cryptozoology, legends, 
            Nessie the Loch Ness Monster and the giant squid.
        </dc:description>
       <dc:created>1999-08-01T00:00:00-06:00</dc:created>
       <dc:creator>Shelley Powers</dc:creator>
       <dc:publisher>Burningbird Network</dc:publisher>
      </pstcn:bio>

<!--Resource's relevancy at time RDF/XML document was built-->
      <pstcn:relevancy rdf:parseType="Resource">
        <pstcn:currentStatus>Active</pstcn:currentStatus>
        <dcterms:valid>2003-12-01T00:00:00-06:00</dcterms:valid>
        <dc:subject>legends</dc:subject>
        <dc:subject>giant squid</dc:subject>
        <dc:subject>Loch Ness Monster</dc:subject>
        <dc:subject>Architeuthis Dux</dc:subject>
        <dc:subject>Nessie</dc:subject>
        <dcterms:isReferencedBy rdf:resource="http://www.pibburns.com/cryptozo.htm" />
        <dcterms:references rdf:resource="http://www.nrcc.utmb.edu/" />
      </pstcn:relevancy>

<!--Presentation/consumption information about resource-->
      <pstcn:presentation rdf:parseType="Resource">
         <dc:format>text/html</dc:format>
         <dcterms:conformsTo>XHTML 1.0 Strict</dcterms:conformsTo>
         <dcterms:conformsTo>CSS Validation</dcterms:conformsTo>
         <dcterms:requires>HTML User agent</dcterms:requires>
         <pstcn:requires rdf:parseType="Resource">
            <pstcn:type>stylesheet</pstcn:type>
            <rdf:value>http://burningbird.net/de.css</rdf:value>
         </pstcn:requires>
         <pstcn:requires rdf:parseType="Resource">
            <pstcn:type>logo</pstcn:type>
            <rdf:value>http://burningbird.net/mm/dynamicearth.jpg</rdf:value>
         </pstcn:requires>
      </pstcn:presentation>

<!--History of events of resource-->
     <pstcn:history>
       <rdf:Seq>
        <rdf:_1 rdf:resource="http://www.yasd.com/dynaearth/monsters1.htm" />
        <rdf:_2 rdf:resource="http://www.dynamicearth.com/articles/monsters1.htm" />
        <rdf:_3 rdf:resource="http://burningbird.net/articles/monsters1.htm" />
      </rdf:Seq>    
     </pstcn:history>

<!--Resources internal to PostCon that are related to resource-->
     <pstcn:related rdf:resource="monsters2.htm" />
     <pstcn:related rdf:resource="monsters3.htm" />
     <pstcn:related rdf:resource="monsters4.htm" />
  </pstcn:Resource>

<!--Related resources-->
  <pstcn:Resource rdf:about="monsters2.htm">
     <dc:title>Cryptozooloy</dc:title>
     <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters3.htm">
     <dc:title>A Tale of Two Monsterss: Architeuthis Dux (Giant Squid)</dc:title>
     <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters4.htm">
     <dc:title>Nessie, the Loch Ness Monster </dc:title>
     <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>

<!--Resource events-->
  <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters1.htm">
      <pstcn:movementType>Add</pstcn:movementType>
      <pstcn:reason>New Article</pstcn:reason>
      <dc:date>1998-01-01T00:00:00-05:00</dc:date>
  </pstcn:Movement>
  <pstcn:Movement rdf:about="http://www.dynamicearth.com/articles/monsters1.htm">
      <pstcn:movementType>Move</pstcn:movementType>
      <pstcn:reason>Moved to separate dynamicearth.com domain</pstcn:reason>
      <dc:date>1999-10-31:T00:00:00-05:00</dc:date>
  </pstcn:Movement>
  <pstcn:Movement rdf:about="http://www.burningbird.net/articles/monsters1.htm">
     <pstcn:movementType>Move</pstcn:movementType>
     <pstcn:reason>Collapsed into Burningbird</pstcn:reason>
     <dc:date>2002-11-01</dc:date> 
  </pstcn:Movement>

</rdf:RDF>

Running this document through the RDF Validator generates the expected RDF graph and no error.

One thing that this exercise demonstrates is the need to keep a vocabulary small and then add to it. As you saw with Dublin Core, the group started with a small set of important elements and then extended this with a new set of qualifier elements. This is a good approach for you to follow with your vocabularies and is the approach that other groups such as the RSS Working Group (discussed in Chapter 13) used. Doing so, others are more likely to make use of your vocabulary, and it also decreases the chances for modification in the future. The complete RDF Schema for PostCon, after the Dublin Core elements have been identified, is actually quite small. It’s shown in its entirety in Example 6-7.

Example 6-7. PostCon RDF Schema
<?xml version="1.0"?>
<rdf:RDF xml:lang="en"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Resource">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
<rdfs:label xml:lang="en"> Web Resource</rdfs:label>
 <rdfs:comment xml:lang="en">
    Web resource managed with PostCon system
 </rdfs:comment>
 <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource" />
</rdfs:Class>

<rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Movement">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
<rdfs:label xml:lang="en"> Web Resource Movement</rdfs:label>
 <rdfs:comment xml:lang="en">
    An event for the resource within the PostCon system
 </rdfs:comment>
</rdfs:Class>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/bio">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource biography</rdfs:label>
 <rdfs:comment xml:lang="en">
    Biographical information for resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/relevancy">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Relevancy</rdfs:label>
 <rdfs:comment xml:lang="en">
    Biographical information for resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/presentation">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Presentation</rdfs:label>
 <rdfs:comment xml:lang="en">
    Information related to relevancy of resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/history">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en"> Web Content History</rdfs:label>
 <rdfs:comment xml:lang="en">
    History of movement of content within system
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/currentStatus">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Current Status</rdfs:label>
 <rdfs:comment>Current status of document (allowable values of Active and Inactive)</rdfs:
comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
 <rdfs:domain rdf:resource="http://postcon/elements/1.0/Relevancy"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/reason">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Reason</rdfs:label>
 <rdfs:comment>Reason</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
 <rdfs:domain rdf:resource="http://postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/movementType">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Movement Type</rdfs:label>
 <rdfs:comment>Type of Movement (allowable values of Move, Add, Remove)</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
 <rdfs:domain rdf:resource="http://postcon/elements/1.0/Movement"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/related">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en"> Related Resource</rdfs:label>
 <rdfs:comment xml:lang="en">
    Resources within PostCon system related to current resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/requires">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Requirement</rdfs:label>
 <rdfs:comment xml:lang="en">
    External resource required by current resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/type">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Type</rdfs:label>
 <rdfs:comment>Type of Required Resource</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

</rdf:RDF>

The schema is in RDF/XML and can be validated. Once validated, it can be embedded within an outer HTML or XHTML document in the location of the schema URI or left as a pure RDF/XML document in same location. The main reason for doing this (it’s not required) is to give people the opportunity to review the schema to better understand the vocabulary. In addition, another reason to do this is that some tools, such as BrownSauce (which we’ll look at in detail in Chapter 7), use the schema to provide better information about the RDF graph.

Using DC-dot to Generate DC RDF

Much about a document can be deleted directly from the document itself. The format, location, subject, author, and copyright from HTML meta tags and so on can all be derived from scraping the HTML for a particular web resource.

Based on this, an organization going by the abbreviation UKOLN, at the University of Bath in the UK, created the DC-dot generator. This online application will scrape a web resource, pull whatever information it can from it, and then return the result formatted in multiple ways, including RDF, XHTML meta tags, and straight XML.

I decided to try this with the sample “Tale of Two Monsters” article. In the first page of the application, I entered the URL for the document, and checked both boxes to have the tool attempt to determine publisher and return RDF. The page returned has a first guess at the RDF/XML and provides a form that you can then use to modify the DC elements generated. Figure 6-4 displays the form you can use to modify the results.

DC-dot format to modify results
Figure 6-4. DC-dot format to modify results

With some modifications, the DC RDF/XML document generated is shown in Example 6-8.

Example 6-8. DC-dot-generated RDF/XML
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd">

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description about="http://burningbird.net/articles/monsters3.htm">
    <dc:title>
      Tale of Two Monsters: Architeuthis Dux
    </dc:title>
    <dc:creator>
      Shelley Powers
    </dc:creator>
    <dc:subject>
      Internet; Web; Computers; Software; Technology;
      Meteorology; Geology; Oceanography; Astronomy; Math;
      Science; Physics; P2P
    </dc:subject>
    <dc:description>
      The Giant Squid and its relationship to mythology.
    </dc:description>
    <dc:publisher>
      Burningbird
    </dc:publisher>
    <dc:date>
      2002-01-20
    </dc:date>
    <dc:type>
      Text
    </dc:type>
    <dc:format>
      text/html
    </dc:format>
    <dc:format>
      8287 bytes
    </dc:format>
  </rdf:Description>
</rdf:RDF>

The generated RDF/XML validates with the RDF Validator, except for one element, boldfaced in the example code—the generator uses an unqualified about attribute, which, though allowed for existing vocabularies, is discouraged with new vocabularies and RDF/XML instances. However, this is a quick change to make.

Now that you’ve had a chance to try out RDF/XML, it’s time to try out a few of the many, many tools and utilities and APIs that have been created specifically for processing RDF/XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.12.202