Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. Analyzing Text Data

In this chapter, we will cover the following recipes:

Preprocessing data using tokenization
Stemming text data
Converting text to its base form using lemmatization
Dividing text using chunking
Building a bag-of-words model
Building a text classifier
Identifying the gender
Analyzing the sentiment of a sentence
Identifying patterns in text using topic modeling

Introduction

Text analysis and natural language processing (NLP) is an integral part of modern artificial intelligence systems. Computers are good at understanding rigidly-structured data with limited variety. However, when we deal with unstructured free-form text, things begin to get difficult. Developing NLP applications is challenging because computers have a hard time understanding underlying concepts. There are also many subtle variations to the way in which we communicate things. These can be in the form of dialects, context, slang, and so on.

In order to solve this problem, NLP applications are developed based on machine learning. These algorithms detect patterns in text data so that we can extract insights from it. Artificial intelligence companies make heavy use of NLP and text analysis to deliver relevant results. Some of the most common applications of NLP include search engines, sentiment analysis, topic modeling, part-of-speech tagging, entity recognition, and so on. The goal of NLP is to develop a set of algorithms so that we can interact with computers in plain English. If we can achieve this, then we wouldn't need programming languages to instruct computers about what they should do. In this chapter, we will look at a few recipes that focus on text analysis and how we can extract meaningful information from text data. We will use a Python package called Natural Language Toolkit (NLTK) heavily in this chapter. Make sure that you install this before you proceed. You can find the installation steps at http://www.nltk.org/install.html. You also need to install NLTK Data, which contains many corpora and trained models. This is an integral part of text analysis! You can find the installation steps at http://www.nltk.org/data.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Analyzing Text Data

Create new playlist

Sign In

Sign Up

Chapter 6. Analyzing Text Data

Introduction

Table of Contents for
6. Analyzing Text Data