N-grams

N-grams combine N consecutive tokens. N-grams can be useful for the BoW model because, depending on the textual context, treating something such as data scientist as a single token may be more meaningful than treating it as two distinct tokens: data and scientist.

textacy makes it easy to view the ngrams of a given length n occurring with at least min_freq times:

from textacy.extract import ngrams
pd.Series([n.text for n in ngrams(doc, n=2, min_freq=2)]).value_counts()
East Asia 2
Asia Earthquake 2
Tsunami Blog 2
annual Bloggies 2
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.63.174