POS tagging probabilistically annotates each word with it's grammatical function—noun, verb, adjective, and so on. Usually, POS tagging serves as an input to syntactic and semantic analysis. Let's demonstrate POS tagging on the FACTORIE toolkit example, a software library written in Scala (http://factorie.cs.umass.edu). To start, you need to download the binary image or source files from https://github.com/factorie/factorie.git and build it:
$ git clone https://github.com/factorie/factorie.git ... $ cd factorie $ git checkout factorie_2.11-1.2 ... $ mvn package -Pnlp-jar-with-dependencies
After the build, which also includes model training, the following command will start a network server on port 3228
:
$ $ bin/fac nlp --wsj-forward-pos --conll-chain-ner java -Xmx6g -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath ./src/main/resources:./target/classes:./target/factorie_2.11-1.2-nlp-jar-with-dependencies.jar found model 18232 Listening on port 3228 ...
Now, all traffic to port 3228
will be interpreted (as text), and the output will be tokenized and annotated:
$ telnet localhost 3228 Trying ::1... Connected to localhost. Escape character is '^]'. But I warn you, if you don't tell me that this means war, if you still try to defend the infamies and horrors perpetrated by that Antichrist--I really believe he is Antichrist--I will have nothing more to do with you and you are no longer my friend, no longer my 'faithful slave,' as you call yourself! But how do you do? I see I have frightened you--sit down and tell me all the news. 1 1 But CC O 2 2 I PRP O 3 3 warn VBP O 4 4 you PRP O 5 5 , O 6 6 if IN O 7 7 you PRP O 8 8 do VBP O 9 9 n't RB O 10 10 tell VB O 11 11 me PRP O 12 12 that IN O 13 13 this DT O 14 14 means VBZ O 15 15 war NN O 16 16 , , O 17 17 if IN O 18 18 you PRP O 19 19 still RB O 20 20 try VBP O 21 21 to TO O 22 22 defend VB O 23 23 the DT O 24 24 infamies NNS O 25 25 and CC O 26 26 horrors NNS O 27 27 perpetrated VBN O 28 28 by IN O 29 29 that DT O 30 30 Antichrist NNP O 31 31 -- : O 32 1 I PRP O 33 2 really RB O 34 3 believe VBP O 35 4 he PRP O 36 5 is VBZ O 37 6 Antichrist NNP U-MISC 38 7 -- : O 39 1 I PRP O 40 2 will MD O 41 3 have VB O 42 4 nothing NN O 43 5 more JJR O 44 6 to TO O 45 7 do VB O 46 8 with IN O 47 9 you PRP O 48 10 and CC O 49 11 you PRP O 50 12 are VBP O 51 13 no RB O 52 14 longer RBR O 53 15 my PRP$ O 54 16 friend NN O 55 17 , , O 56 18 no RB O 57 19 longer RB O 58 20 my PRP$ O 59 21 ' POS O 60 22 faithful NN O 61 23 slave NN O 62 24 , , O 63 25 ' '' O 64 26 as IN O 65 27 you PRP O 66 28 call VBP O 67 29 yourself PRP O 68 30 ! . O 69 1 But CC O 70 2 how WRB O 71 3 do VBP O 72 4 you PRP O 73 5 do VB O 74 6 ? . O 75 1 I PRP O 76 2 see VBP O 77 3 I PRP O 78 4 have VBP O 79 5 frightened VBN O 80 6 you PRP O 81 7 -- : O 82 8 sit VB O 83 9 down RB O 84 10 and CC O 85 11 tell VB O 86 12 me PRP O 87 13 all DT O 88 14 the DT O 89 15 news NN O 90 16 . . O
This POS is a single-path left-right tagger that can process the text as a stream. Internally, the algorithm uses probabilistic techniques to find the most probable assignment. Let's also look at other techniques that do not use grammatical analysis and yet proved to be very useful for language understanding and interpretation.
3.142.135.34