Basics of NLP pipeline

Textual data is a very large source of information and properly dealing with it is crucial to success. So to handle these text data we need some basic text processing steps.

Most of the processing steps covered in this section are commonly used in NLP and involve combining a number of steps into  one executable flow. This is what we refer to as the NLP pipeline.

This flow can be a combination of tokenization, stemming, word frequency, parts of speech tagging, and many more.

Let's look into the details on how to implement the steps in the NLP pipeline and specifically what each processing does. We will use the Natural Language Toolkit (NLTK) package— an NLP toolkit written in Python.

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.63