Extracting Verbs

Let's extract all the verbs which are present in the corpus. In this case, we are using verb tags as VB, VBD, VBG, VBN, VBP, VBZ.


verbs = []
for tag in tagged_wt:
verbs.append([k for k,v in tag if v in ['VB','VBD','VBG','VBN','VBP','VBZ']])

[['extract', 'meaning', 'is', 'analyze'], ['breaking', 'is', 'called', 'are', 'referred'], ['are'], ['has', 'use'], ['is', 'are', 'are', 'are'], ['Using', "'s", 'create', 'counting']]

Now, let's use spacy, to tokenize a piece of text and access the part of speech attribute for each token. As an example application, we’ll tokenize the previous paragraph and count the most common nouns with the following code.  We’ll also lemmatize the tokens, which gives the root form a word to help us standardize across forms of a word:

! pip install -q spacy 
! pip install -q tabulate
! python -m spacy download en_core_web_lg



from collections import Counter
import spacy
from tabulate import tabulate
nlp = spacy.load('en_core_web_lg')

doc = nlp(text)
noun_counter = Counter(token.lemma_ for token in doc if token.pos_ == 'NOUN')

print(tabulate(noun_counter.most_common(5), headers=['Noun', 'Count']))

Noun Count
----------- -------
step 3
combination 2
text 2
processing 2
datum 2
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.244.217