Man Cannot Live on Facts Alone

The semantic web’s fundamental construct for representing knowledge is called a triple, which is a highly intuitive and very natural way of expressing a fact. As an example, the sentence we’ve considered on many previous occasions—“Mr. Green killed Colonel Mustard in the study with the candlestick”—expressed as a triple might be something like (Mr. Green, killed, Colonel Mustard), where the constituent pieces of that triple refer to the subject, predicate, and object of the sentence. The Resource Description Framework (RDF) is the semantic web’s model for defining and enabling the exchange of triples. RDF is highly extensible in that while it provides a basic foundation for expressing knowledge, it can also be used to define specialized vocabularies called ontologies that provide precise semantics for modeling specific domains. More than a passing mention of specific semantic web technologies such as RDF, RDFa, RDF Schema, and OWL would be well out of scope here at the eleventh hour, but we will work through a high-level example that attempts to explain some of the hype around the semantic web in general.

Open-World Versus Closed-World Assumptions

One interesting difference between the way inference works in logic programming languages such as Prolog[64] as opposed to in other technologies, such as the RDF stack, is whether they make open-world or closed-world assumptions about the universe. Logic programming languages such as Prolog and most traditional database systems assume a closed world, while RDF technology generally assumes an open world. In a closed world, everything that you haven’t been explicitly told about the universe should be considered false, whereas in an open world, everything you don’t know is arguably more appropriately handled as being undefined (another way of saying “unknown”). The distinction is that reasoners that assume an open world will not rule out interpretations that include facts that are not explicitly stated in a knowledge base, whereas reasoners that assume the closed world of the Prolog programming language or most database systems will rule out facts that are not explicitly stated. Furthermore, in a system that assumes a closed world merging contradictory knowledge would generally trigger an error, while a system assuming an open world may try to make new inferences that somehow reconcile the contradictory information. As you might imagine, open-world systems are quite flexible and can lead to some very interesting conundrums; the potential can become especially pronounced when disparate knowledge bases are merged.

Intuitively, you might think of it like this: systems predicated upon closed-world reasoning assume that the data they are given is complete, and they are typically non-monotonic in the sense that it is not the case that every previous fact (explicit or inferred) will still hold when new ones are added. In contrast, open-world systems make no such assumption about the completeness of their data and are monotonic. As you might imagine, there is substantial debate about the merits of making one assumption versus the other. As someone interested in the semantic web, you should at least be aware of the issue. As the matter specifically relates to RDF, official guidance from the W3C documentation states:[65]

To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make statements about any resource. In general, it is not assumed that complete information about any resource is available. RDF does not prevent anyone from making assertions that are nonsensical or inconsistent with other statements, or the world as people see it. Designers of applications that use RDF should be aware of this and may design their applications to tolerate incomplete or inconsistent sources of information.

You might also check out Peter Patel-Schneider and Ian Horrocks’ “Position Paper: A Comparison of Two Modelling Paradigms in the Semantic Web” if you’re interested in pursuing this topic further. Whether you decide to dive into this topic right now, keep in mind that the data that’s available on the Web is incomplete, and that making a closed-world assumption (i.e., considering all unknown information emphatically false) will entail severe consequences sooner rather than later.

Inferencing About an Open World with FuXi

Foundational languages such as RDF Schema and OWL are designed so that precise vocabularies can be used to express facts such as the triple (Mr. Green, killed, Colonel Mustard) in a machine-readable way, and this is a necessary but not sufficient condition for the semantic web to be fully realized. Generally speaking, once you have a set of facts, the next step is to perform inference over the facts and draw conclusions that follow from the facts. The concept of formal inference dates back to at least ancient Greece with Aristotle’s syllogisms, and the obvious connection to how machines can take advantage of it has not gone unnoticed by researchers interested in artificial intelligence for the past 50 or so years. The Java-based landscape that’s filled with enterprise-level options such as Jena and Sesame certainly seems to be where most of the heavyweight action resides, but fortunately, we do have a couple of solid options to work with in Python.

One of the best Pythonic options capable of inference that you’re likely to encounter is FuXi. FuXi is a powerful logic-reasoning system for the semantic web that uses a technique called forward chaining to deduce new information from existing information by starting with a set of facts, deriving new facts from the known facts by applying a set of logical rules, and repeating this process until a particular conclusion can be proved or disproved or there are no more new facts to derive. The kind of forward chaining that FuXi delivers is said to be both sound, because any new facts that are produced are true, and complete, because any facts that are true can eventually be proven. A full-blown discussion of propositional and first-order logic could easily fill a book; if you’re interested in digging deeper, the classic text Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (Prentice Hall) is probably the most comprehensive resource.

To demonstrate the kinds of inferencing capabilities a system such as FuXi can provide, let’s consider the famous example of Aristotle’s syllogism[66] in which you are given a knowledge base that contains the facts “Socrates is a man” and “All men are mortal,” which allows you to deduce that “Socrates is mortal.” While this problem may seem too trivial, keep in mind that the same deterministic algorithms that produce the new fact that “Socrates is mortal” work the very same way when there are significantly more facts available—and those new facts may produce additional new facts, which produce additional new facts, and so on. For example, consider a slightly more complex knowledge base containing a few additional facts:

  • Socrates is a man

  • All men are mortal

  • Only gods live on Mt Olympus

  • All mortals drink whisky

  • Chuck Norris lives on Mt Olympus

If presented with the given knowledge base and then posed the question, “Does Socrates drink whisky?”, you would first have to deduce the fact that “Socrates is mortal” before you could deduce the follow-on fact that “Socrates drinks whisky.” To illustrate how all of this would work in code, consider the same knowledge base now expressed in Notation3 (N3), as shown in Example 10-1. N3 is a simple yet powerful syntax that expresses facts and rules in RDF. While there are many different formats for expressing RDF, many semantic web tools choose N3 because its readability and expressiveness make it accessible.

Example 10-1. A small knowledge base expressed with Notation3

#Assign a namespace for logic predicates
@prefix log: <http://www.w3.org/2000/10/swap/log#> .

#Assign a namespace for the vocabulary defined in this document
@prefix : <MiningTheSocialWeb#> .

#Socrates is a man
:Socrates a :Man.

@forAll :x . 

#All men are mortal: Man(x) => Mortal(x)
{ :x a :Man } log:implies { :x a :Mortal } . 

#Only gods live at Mt Olympus: Lives(x, MtOlympus) <=> God(x)
{ :x :lives :MtOlympus } log:implies { :x a :god } . 
{ :x a :god } log:implies { :x :lives :MtOlympus } . 

#All mortals drink whisky: Mortal(x) => Drinks(x, whisky)
{ :x a :Man } log:implies { :x :drinks :whisky } . 

#Chuck Norris lives at Mt Olympus: Lives(ChuckNorris, MtOlympus)
:ChuckNorris :lives :MtOlympus .

Running FuXi with the --ruleFacts option tells it to parse the facts from the input source that you can specify with the --rules option and to accumulate additional facts from the source. You should see output similar to that shown in Example 10-2 if you run FuXi from the command line. Note that FuXi should appear in your path after you easy_install fuxi.

Example 10-2. Results of running FuXi from the command line on the knowledge base in Example 10-1

$ FuXi --rules=foo.n3 --ruleFacts

@prefix _7: <file:///Users/matthew/MiningTheSocialWeb#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

 _7:ChuckNorris a _7:god. 

 _7:Socrates a _7:Mortal;
     _7:drinks _7:whisky. 

The output of the program tells us a few things that weren’t explicitly stated in the initial knowledge base: Chuck Norris is a god, Socrates is a mortal, and Socrates drinks whisky. Although deriving these facts may seem obvious to most human beings, it’s quite another story for a machine to have derived them—and that’s what makes things exciting.

Warning

It should be noted that the careless assertion of facts about Chuck Norris (even in the context of a sample knowledge base) could prove harmful to your health or to the life of your computer.[67] You have been duly warned.

If this simple example excites you, by all means, dig further into FuXi and the potential the semantic web holds. The semantic web is arguably much more advanced and complex than the social web, and investigating it is certainly a very worthy pursuit—especially if you’re excited about the possibilities that inference brings to social data.



[64] You’re highly encouraged to check out a bona fide logic-based programming language like Prolog that’s written in a paradigm designed specifically so that you can represent knowledge and deduce new information from existing facts. GNU Prolog is a fine place to start.

[66] In modern parlance, a syllogism is more commonly called an “implication.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.160.119