Chapter 7. Using Parser to Extract Relationships

Parsing is the process of creating a parse tree for a textual unit. This unit may be for a line of code or a sentence. It is easy to do for computer languages, since they were designed to make the task easy. However, this has made it harder to write code. Natural language parsing is considerably more difficult. This is due to the ambiguity found in natural languages. This ambiguity makes a language difficult to learn but offers great flexibility and expressive power. Here, we are not interested in parsing computer languages, but rather natural languages.

A parse tree is a hierarchical data structure that represents the syntactic structure of a sentence. Often, this is presented as a tree graph with a root as we will illustrate shortly. We will use the parse tree to help identify relationships between entities in the tree.

Parsing is used for many tasks, including:

  • Machine translation of languages
  • Synthesizing speech from text
  • Speech recognition
  • Grammar checking
  • Information extraction

Coreference resolution is the condition where two or more expressions in text refer to the same individual or thing. For example, in this sentence:

"Ted went to the party where he made an utter fool of himself."

The words "Ted", "he", and "himself" refer to the same entity, "Ted". This is important in determining the correct interpretation of text and in determining the relative importance of text sections. We will demonstrate how the Stanford API addresses this problem.

Extracting relationships and information from text is an important NLP task. Relationships may exist between entities, such as the subject of a sentence and either its object, other entities, or perhaps its behavior. We may also want to identify relationships and present them in a structured form. We can use this information either to present the results for immediate use by people or to format the relationships so that they can be better utilized for a downstream task.

In this chapter, we will examine the parsing process and see how the parse tree is used. We will examine the relationship extraction process and investigate relationship types, use extracted relationships, and learn to use NLP APIs.

Relationship types

There are many possible relationship types. A few categories and examples of relationships are found in the following table. An interesting site that contains a multitude of relationships is Freebase (https://www.freebase.com/). It is a database of people, places, and things organized by categories. The WordNet thesaurus (http://wordnet.princeton.edu/) contains a number of relationships.

Relationship

Example

Personal

father-of, sister-of, girlfriend-of

Organizational

subsidiary-of, subcommittee–of

Spatial

near-to, northeast-of, under

Physical

part-of, composed-of

Interactions

bonds-with, associates-with, reacts-with

Name Entity Recognition is a low level type of NLP classification that was covered in Chapter 4, Finding People and Things. However, many applications need to go beyond this and identify different types of relationships. For example, when NER is applied to identify individuals, then knowing that we are dealing with a person can further refine the relationships present.

Once these entities have been identified, then links can be created to their containing documents or used as indexes. For question answering applications, named entities are often used for answers. When a sentiment of text is determined, it needs to be attributed to some entity.

For example, consider the following input:

He was the last person to see Fred.

Using OpenNLP NER as input with the preceding sentence, as we did in Chapter 4, Finding People and Things, we get the following output:

Span: [7..9) person
Entity: Fred

Using the OpenNLP parser, we get a lot more information about the sentence:

(TOP (S (NP (PRP He)) (VP (VBD was) (NP (NP (DT the) (JJ last) (NN person)) (SBAR (S (VP (TO to) (VP (VB see))))))) (. Fred.)))

Consider the following input:

The cow jumped over the moon.

For the preceding sentence, the parser returns this:

(TOP (S (NP (DT The) (NN cow)) (VP (VBD jumped) (PP (IN over) (NP (DT the) (NN moon))))))

There are two types of parsing:

  • Dependency: This focuses on the relationship between words
  • Phrase structure: This deals with phrases and their recursive structure

Dependencies can use labels such as subject, determiner, and prepositions to find relationships. Parsing techniques include shift-reduce, spanning tree, and cascaded chunking. We are not concerned about these differences here, but will focus on the use and outcome of various parsers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.137.75