Extracting relationships from sentences

Knowing the relationship between elements of a sentence is important in many analysis tasks. It is useful for assessing the important content of a sentence and providing insight into the meaning of a sentence. This type of analysis has been used for tasks ranging from grammar checking to speech recognition to language translations.

In the previous section, we demonstrated one approach used to extract the parts of speech. Using this technique, we were able to identify the sentence element types present in a sentence. However, the relationships between these elements is missing. We need to parse the sentence to extract these relationships between sentence elements.

Using OpenNLP to extract relationships

There are several techniques and APIs that can be used to extract this type of information. In this section we will use OpenNLP to demonstrate one way of extracting the structure of a sentence. The demonstration is centered around the ParserTool class, which uses a previously trained model. The parsing process will return the probabilities that the sentence's elements extracted are correct. As will many NLP tasks, there are often multiple answers possible.

We start with a try-with-resource block to open an input stream for the model. The en-parser-chunking.bin file contains a model that uses parses text into its POS. In this case, it is trained for English:

try (InputStream modelInputStream = new FileInputStream( 
            new File("en-parser-chunking.bin"));) { 
    ... 
} catch (Exception ex) { 
    // Handle exceptions 
}  

Within the try block an instance of the ParserModel class is created using the input stream. The actual parser is created next using the ParserFactory class's create method:

ParserModel parserModel = new ParserModel(modelInputStream); 
Parser parser = ParserFactory.create(parserModel); 

We will use the following sentence to test the parser. The ParserTool class's parseLine method does the actual parsing and returns an array of Parse objects. Each of these objects holds one parsing alternative. The last argument of the parseLine method specifies how many alternatives to return:

String sentence = "Let's parse this sentence."; 
Parse[] parseTrees = ParserTool.parseLine(sentence, parser, 3); 

The next sequence displays each of the possibilities:

for(Parse tree : parseTrees) { 
    tree.show(); 
} 

The output of the show method for this example follows. The tags were previously defined in Understanding POS tags section:

(TOP (NP (NP (NNP Let's) (NN parse)) (NP (DT this) (NN sentence.))))
(TOP (S (NP (NNP Let's)) (VP (VB parse) (NP (DT this) (NN sentence.)))))
(TOP (S (NP (NNP Let's)) (VP (VBD parse) (NP (DT this) (NN sentence.)))))

The following example reformats the last two outputs to better show the relationships. They differ in how they classify the verb parse:

(TOP 
(S 
(NP (NNP Let's)) 
(VP (VB parse) 
(NP (DT this) (NN sentence.))
)
)
)
(TOP 
(S 
(NP (NNP Let's)) 
(VP (VBD parse) 
(NP (DT this) (NN sentence.)) 
)
)
)

When there are multiple parse alternatives, the Parse class's getProb returns a probability that reflects the model's confidence in the alternatives. The following sequence demonstrates this method:

for(Parse tree : parseTrees) { 
    out.println("Probability: " + tree.getProb()); 
} 

The output follows:

Probability: -3.6810244423259078
Probability: -3.742475884515823
Probability: -4.16148634555491

Another interesting NLP task is sentiment analysis, which we will demonstrate next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.216.175