Tokenization

We can take the text and tokenize it as per our requirements. The first step is to generate different sentences from a text. This can be done easily using this library:

text.sentences

It should list the set of output, such as the following:

[Sentence("Twitter is one of the most important social media used in today's world."),
 Sentence("It provides the platform to share people's opinions, facts and information regarding person, place, animals or things."),
 Sentence("These tweets are used by several private, governmental and non-governmental organizations to mine different types of information including business intelligence.")]

Not only that, we can list out words from any text:

text.words

Run the preceding snippet and study the output you get from that. Say you want to see the frequency of any particular word. You can do the following:

text.word_counts['twitter']
1

It should display the frequency of the words in the text. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.211.66