16.6. Ontologies and the Semantic Web

Etymologically, the term “ontology” means the study of existence. In philosophy, ontology is the branch of metaphysics concerned with the fundamental nature of being, addressing deep questions such as “Do nonphysical things exist?”, “Does an object remain identical to itself when it undergoes change”, and so on. The informatics community later adopted the term “ontology”, typically using it to mean a conceptual model about some business domain, where the model is designed to facilitate sharing information about that domain by conforming to some standard set of constructs.

With the rise of the Internet, millions of documents are now readily accessible over the World Wide Web. However, these documents were created by many different people, typically with little or no thought as to sharing or combining their information with other documents. In 2001, Sir Tim Bemers-Lee, the founder of the Internet, proposed a new vision for a semantic web as extension of the current web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation” (Berners-Lee et al. 2001). The main idea is to add global identifiers and structure to the documents (e.g., using uniform resource identifers and embedded tags) to reveal the underlying semantics of what is to be shared, in a way that is accessible to automated agents.

In 1999, the W3C produced the first version of the Resource Description Framework (RDF) language as a standard on top of XML to capture metadata for web resources (e.g., who authored a web document). Other work on adding automatically accessible meaning to language via markup tags was conducted in the United States by the Defense Advanced Research Products Agency (DARPA), which produced the DARPA Agent Markup Language (DAML). With similar objectives, researchers in the European Union (EU) developed the Ontology Inference Layer (OIL). In 2001, joint work between the United States and the EU incorporated many concepts from OIL into DAML, resulting in the DAML+OIL language, which was submitted in late 2002 to the World Wide Web Consortium (W3C).

In parallel with this effort, RDF was extended by a simple ontology language, RDF Schema (RDFS), which was adopted in 2004 by the W3C. In the same year, the W3C formally recommended the Web Ontology Language (OWL), which incorporates many aspects from DAML+OIL and RDFS, has a cleaner formal semantics based mainly on description logics, and is currently the most popular language for developing ontologies for the semantic web.

Figure 16.19 summarizes the typical dependencies between layers, where upper layers depend on lower layers. We now look briefly at some of the main concepts underlying RDF, RDFS, and OWL. Detailed coverage of these topics is accessible online using the links in the chapter notes.

Figure 16.19. Foundational layers for the semantic web.


RDF models are basically directed graphs, where each node is either a resource or a literal, and each directed arc represents a binary relationship between the nodes. A resource is anything that can be identified by a Uniform Resource Identifier (URI). A URI is a Uniform Resource Locator (URL) or a Uniform Resource Name (URN).

A URL (e.g., http://www.orm.net) includes a network access type (e.g., http for hypertext transfer protocol) plus a network address to locate the resource. A URN (e.g., urmisbn: 1-55860-672-6) includes a name space (e.g., ISBN for International Standard Book Number) plus a name that identifies that resource within that name space. Unlike a URL, a URN does not provide a way to locate the resource on the web. Both URLs and URNs are global identifiers, so when used in different documents they refer to the same resource.

To be more precise, RDF identifies resources by URI references. A URI reference (or URIref) is a URI, optionally followed by a fragment identifier. For example, the URI reference http://www.w3.Org/TR/rdf-primer/#basicconcepts consists of the URI http://www.w3.org/TR/rdf-primer/ and (separated by the “#” character) the fragment identifier basicconcepts. The character strings used for URIrefs are composed of Unicode characters.

Figure 16.20 shows a simple RDF model depicted as a graph. Here we show resource nodes as named, soft rectangles; literals as named, hard rectangles; and relationships as named arrows from the subject node to the object node. The graph is intended to convey the semantics that there is a book identified by the ISBN 1-55860-672-6 that is authored by a person named “Terry Halpin”.

Figure 16.20. A simple RDF model.


RDF captures the information as a set of simple statements, each of which may be represented as a triple (subject, predicate, object). The subject and predicate are resources, and hence are identified by a URIref. The object is either a resource or a literal. Literals may be untyped (as here) or typed (by pairing the value with a data type, that itself is identified by a URIref to a standard XML Schema data type). Each predicate (also known as a property of the resource subject) includes a predicate reading (e.g., “authored_by”, “has_name”, “type”), but since the same predicate reading may be used to mean different things in different contexts, a full URIref is required to assign a unique meaning to a predicate.

The resource nodes on the left of Figure 16.20 denote an individual book and an individual person. However, the resource nodes on the right denote the types Book and Person. The type predicate relates an instance to a type, and means “is an instance of”.

RDF also allows descriptions of resource collections (e.g., sets and lists), and RDF graphs can be serialized into XML format using RDF/XML, but we ignore these feature here.

As the example in Figure 16.20 shows, RDF graphs allow nodes that are instances or types, and allow arcs that represent relationships between instances, or relationships between instances and types. RDF also allows relationships between types and types (e.g., Woman is_a Person). RDF even allows you to state that a type is an instance of itself (e.g., Class is_of_type Class), which can lead to formal problems such as Russell’s paradox. Moreover, no special semantics is assigned to any predicate, so the is_an_instance_of relationship and the is_a_subtype_of relationships are treated the same as the has_name relationship (apart from different strings used for the URIrefs). So there is no ability to perform inferences based on knowledge of special properties for certain predicates (e.g., transitivity of is_a_subtype_of).

RDFS builds on RDF by providing inbuilt support for classes and subclassing. For example, resources can be typed as classes (using rdfsrClass) and the predefined predicate rdfs:subClassOf is defined to be transitive. For a given predicate, the set of instances playing the subject and object roles are called the domain and range, respectively. RDFS allows predicates to be typed by restricting their domain (using rdfs:domain) and range (using rdfs:range), respectively. Membership in a collection can also be specified using the rdfsrmember property.

Despite these improvements, RDFS is still both too inexpressive (e.g., it cannot constrain associations to be other than m:n, and it cannot express complex properties by conjoining existing properties) and too expressive (e.g., like RDF it allows you to state that Class is an instance of itself, and its underlying logic is undecidable).

Like RDFS, OWL is restricted to binary fact types (no n-aries or unaries). OWL has a simpler formal semantics than RDFS, disallowing some freedoms in RDFS, while at the same time adding the ability to capture further constraints. OWL is used to specify the ontology schema (called a terminological box or Tbox) for a business domain, and RDF/XML is used to mark up conforming instance data (the fact population is called an assertional box or Abox). Since information on the web is often incomplete, OWL adopts the open world assumption (the failure to find or infer some proposition does not imply that proposition is false). The OWL specification has three dialects. In increasing order of expressibility, these are OWL Lite, OWL DL, and OWL Full.

OWL Lite is designed for simple ontologies composed mainly of classification hierarchies and relationships with simple constraints. Based on the description logic SHIF(D), it has low formal complexity and is decidable (an algorithm exists to determine whether any given formula in that logic is a logical truth, hence all its computations are guaranteed to be completed in a finite time).

By default, all relationship types are optional and m:n. OWL Lite allows fact roles to be declared mandatory and fact types to specified as n:1, 1:n, or 1:1 using minCardinality and maxCardinality restrictions. Figure 16.21 explains the meaning of these restrictions by relating them to an equivalent representation in ORM. The ORM fact type A R B is diagrammed as an RDF graph, where the subject and object nodes are assumed to be classes. The predicate R is directed from A to B.

Figure 16.21. Some correspondences between ORM and OWL.


OWL itself has no graphic notation, so the fact type and its restrictions are declared textually (see example later). Restricting R with a minCardinality of 1 means that each instance of A plays in that relationship type with at least one instance of B. In ORM terms, the first role of R is mandatory. In OWL DL and OWL Full, a minCardinalty above 1 may be specified (in ORM this adds a frequency constraint to the mandatory constraint).

Assigning a minCardinality of 0 means that the role is optional. Restricting the maxCardinality to 1 means that each instance of A plays in that relationship type with at most one instance of B. In ORM terms, the first role of R has a simple uniqueness constraint. In OWL DL and OWL Full, a maxCardinalty above 1 may be specified (in ORM this replaces the uniqueness constraint by a frequency constraint).

Restricting the maxCardinality to 0 means that each instance of the class (let’s call this class Q being discussed (perhaps in a query) plays in that relationship type with no instance of B. In ORM terms, this could be handled explicitly by a subtype definition as shown, or simply included as a condition in an ORM query. For example, we might define a teetotaller as a person who drinks no alcoholic drink.

As an example of some cardinality decarations, the following OWL code declares that the hasFaxNr predicate is optional and n:1.

<owl:Restriction>
    <owl:onProperty rdf:resource="#hasFaxNr"/>
    <owl:minCardinality rdf:datatype="&xsd;nonNegativelnteger">O</owl:minCardinality>
<owl:Restriction>

<owl:Restriction>
   <owl:onProperty rdf:resource="#hasFaxNr"/>
   <owl:maxCardinality rdf:datatype="&xsd;nonNegativelnteger">l </owl:maxCardinality>
</owl:Restriction>

As a shortcut when the minimum and maximum cardinalities are the same, the value may be assigned to the cardinality property. So setting owlxardinality to 0 means forbidden, and setting it to 1 means exactly one.

In OWL Lite, predicates may be declared to be transitive (e.g., ancestorOf) or symmetric (e.g., spouseOf). One predicate may be specified as the inverse of another (e.g., isOwnedBy is the inverse of owns). Class extensions may be equated, and the intersection of two classes may be derived.

Two URIrefs (perhaps from different documents) may be declared to identify the same individual using the owl:sameAs predicate. The owl:differentFrom predicate can be used to indicate that individuals differ.

While OWL Lite provides an easy migration path for simple ontologies such as taxonomies, and can be implemented efficiently, it is too weak for modeling complex ontologies or business domains. It cannot express most of the constraints found in such domains, and its derivation capability is limited to simple inferences such as exploiting transitivity.

OWL DL (the “DL” refers to description logic) is based on the stronger SHOIN(D) description logic. This dialect increases the expressive power of the language, while retaining decidability. It includes all the language constructs of OWL Full, but places restrictions on their use. For example, OWL DL is a fragment of first order logic, so it does not allow a class to be an instance of another class.

OWL DL allows classes and values to be enumerated and allows classes to be declared mutually exclusive (using owhdisjointWith). Arbitrary Boolean (and, or, not) combinations of classes are allowed using owl:intersectionOf, owhunionOf, and owlxomplementOf properties.

By allowing arbitrary values for cardinalities, OWL DL allows single role frequency constraints to be specified perhaps combined with mandatory constraints (e.g., each duet has two members). Simple subclass definitions may be specified using the owl:hasValue predicate to restrict subclass instances to those instances of the specified superclass that have certain values (e.g., female patients are patients with their gender value is restricted to ‘F’).

OWL Full adds even more expressibility by removing various restrictions, allowing any RDF expression whatsoever. For example, OWL Full allows a class to be treated simultaneously as an instance. Hence OWL Full goes beyond description logics and even first order logic. Despite the undecidable nature of OWL Full, it is still not expressive enough to capture many common business rules. Moreover, there is no standard graphic notation for OWL, nor is there a standard high level language for its expression. Compare the earlier example of OWL code with the ORM verbalization Each Person has at most one FaxNr. Hence the direct use of OWL tends to be restricted to formal logicians or technically trained developers.

An obvious solution to this problem is to first express the ontology in a conceptual language such as ORM and then have a tool map it automatically to OWL as required. Some recent research efforts have investigated mapping from conceptual models in ORM, UML, and ER to OWL as well as other description logic-based languages. For example, Jarrar (2007) discusses mapping ORM to SHOIN(D), the description logic underlying OWL DL, and Keet (2007) discusses mapping ORM to the more powerful description logic DLRifd:

Much of ORM can be mapped cleanly to these description logics, facilitating the use of efficient DL-based theorem provers for consistency checking of ORM schemas, and interoperability with other semantic web applications. Unlike OWL Full, ORM has a clean, first order formalization, so problems about types being instances are avoided. However, ORM is richer than OWL in some respects, since some of its structures and constraints cannot be captured in OWL or even DLRifd. This has both good and bad consequences.

On the plus side, ORM is capable of capturing, in a natural fashion, several complex aspects of reality that commonly occur in business domains but cannot be captured in popular description logics. Hence ORM enables higher fidelity models of typical business domains. On the negative side, ORM is so expressive that its underlying logic extends a little beyond decidable fragments of first order logic. This makes it difficult to achieve efficient automated checking of aspects such as schema satisfiability (e.g., in very large schemas with millions of constraints, checking that none of the constraints contradict one another).

This is another example of the trade-off between expressibility and tractability. The more expressive the language, the harder it is to efficiently verify the logical consistency of models expressed in it. Since we really ought to ensure that our models accurately reflect the business domains they are intended to model, it seems reasonable to capture the intended semantics in a language such as ORM, apply efficient automated checking where possible, and use heuristic techniques for the rest.

Examples of ORM features that can’t be translated into SHOIN(D), and hence OWL DL, include uniqueness and frequency constraints spanning multiple roles (of the same or different relationships), set-comparison constraints involving multirole arguments, acyclic ring constraints, and deontic constraints. The DLRifd description logic adds support for identifier and functional dependency constraints, so it can capture internal uniqueness constraints spanning multiple roles as well as external uniqueness constraints.

Unlike ER and UML, ORM is attribute free and hence is easier to relate to RDF, RDFS, OWL, and description logics, which also represent all facts in the form of relationships. Although RDF and OWL use binary relationships only, it is a trivial matter to transform unaries and n-aries into binary form, as discussed in earlier chapters. So there is a fairly obvious mapping of facts and fact types in binary ORM to these semantic web languages. The only aspects that are lost in the mapping are those ORM constraints and derivation rules that have no counterpart in the logic underlying the target language. These can be mapped over as unambiguous verbalizations in ancillary notes.

Given the significant potential of ORM for capturing ontologies, it is not surprising that research laboratories are actively developing ORM-based ontology tools, such as DOGMA and T-Lex (Trog et al. 2006). An atomic (elementary or existential) fact type in ORM offers an ideal semantic unit for sharing semantics between different models. Such a unit is constraint free, allowing different models to apply different constraints without altering the semantics of the fact type. For example, in one model the fact type Person drives Car might be optional and m:n, while in another it might be mandatory and n:1. Using a fact-based approach instead of an attribute-based approach facilitates this kind of exchange. This is one reason why the OMG’s SBVR approach for exchanging semantics of business rules is also fact oriented.

At the time of writing, the dream of Sir Tim Berners-Lee for a truly semantic web has not yet materialized. But given the rapidly increasing research efforts underway in both academia and industry on semantic web technology, it seems likely that this dream will be at least partly realized within the next decade or so.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.94.249