Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Akshay Kulkarni and Adarsha ShivanandaNatural Language Processing Recipeshttps://doi.org/10.1007/978-1-4842-4267-4_4

4. Advanced Natural Language Processing

Akshay Kulkarni¹ and Adarsha Shivananda¹

(1)

Bangalore, Karnataka, India

In this chapter, we are going to cover various advanced NLP techniques and leverage machine learning algorithms to extract information from text data as well as some of the advanced NLP applications with the solution approach and implementation.

Recipe 1. Noun Phrase extraction
Recipe 2. Text similarity
Recipe 3. Parts of speech tagging
Recipe 4. Information extraction – NER – Entity recognition
Recipe 5. Topic modeling
Recipe 6. Text classification
Recipe 7. Sentiment analysis
Recipe 8. Word sense disambiguation
Recipe 9. Speech recognition and speech to text
Recipe 10. Text to speech
Recipe 11. Language detection and translation

Before getting into recipes, let’s understand the NLP pipeline and life cycle first. There are so many concepts we are implementing in this book, and we might get overwhelmed by the content of it. To make it simpler and smoother, let’s see what is the flow that we need to follow for an NLP solution.

For example, let’s consider customer sentiment analysis and prediction for a product or brand or service.

Define the Problem : Understand the customer sentiment across the products.
Understand the depth and breadth of the problem : Understand the customer/user sentiments across the product; why we are doing this? What is the business impact? Etc.
Data requirement brainstorming : Have a brainstorming activity to list out all possible data points.
- All the reviews from customers on e-commerce platforms like Amazon, Flipkart, etc.
- Emails sent by customers
- Warranty claim forms
- Survey data
- Call center conversations using speech to text
- Feedback forms
- Social media data like Twitter, Facebook, and LinkedIn
Data collection : We learned different techniques to collect the data in Chapter 1. Based on the data and the problem, we might have to incorporate different data collection methods. In this case, we can use web scraping and Twitter APIs.
Text Preprocessing : We know that data won’t always be clean. We need to spend a significant amount of time to process it and extract insight out of it using different methods that we discussed earlier in Chapter 2.
Text to feature : As we discussed, texts are characters and machines will have a tough time understanding them. We have to convert them to features that machines and algorithms can understand using any of the methods we learned in the previous chapter.
Machine learning/Deep learning : Machine learning/Deep learning is a part of an artificial intelligence umbrella that will make systems automatically learn patterns in the data without being programmed. Most of the NLP solutions are based on this, and since we converted text to features, we can leverage machine learning or deep learning algorithms to achieve the goals like text classification, natural language generation, etc.
Insights and deployment : There is absolutely no use for building NLP solutions without proper insights being communicated to the business. Always take time to connect the dots between model/analysis output and the business, thereby creating the maximum impact.

Recipe 4-1. Extracting Noun Phrases

In this recipe, let us extract a noun phrase from the text data (a sentence or the documents).

Problem

You want to extract a noun phrase.

Solution

Noun Phrase extraction is important when you want to analyze the “who” in a sentence. Let’s see an example below using TextBlob.

How It Works

Execute the below code to extract noun phrases.

#Import libraries

import nltk

from textblob import TextBlob

#Extract noun

blob = TextBlob("John is learning natural language processing")

for np in blob.noun_phrases:

print(np)

Output:

john

natural language processing

Recipe 4-2. Finding Similarity Between Texts

In this recipe, we are going to discuss how to find the similarity between two documents or text. There are many similarity metrics like Euclidian, cosine, Jaccard, etc. Applications of text similarity can be found in areas like spelling correction and data deduplication.

Here are a few of the similarity measures:

Cosine similarity : Calculates the cosine of the angle between the two vectors.
Jaccard similarity : The score is calculated using the intersection or union of words.
Jaccard Index = (the number in both sets) / (the number in either set) * 100.
Levenshtein distance : Minimal number of insertions, deletions, and replacements required for transforming string “a” into string “b.”
Hamming distance : Number of positions with the same symbol in both strings. But it can be defined only for strings with equal length.

Problem

You want to find the similarity between texts/documents.

Solution

The simplest way to do this is by using cosine similarity from the sklearn library.

How It Works

Let’s follow the steps in this section to compute the similarity score between text documents.

Step 2-1 Create/read the text data

Here is the data:

documents = (

"I like NLP",

"I am exploring NLP",

"I am a beginner in NLP",

"I want to learn NLP",

"I like advanced NLP"

)

Step 2-2 Find the similarity

Execute the below code to find the similarity.

#Import libraries

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

#Compute tfidf : feature engineering(refer previous chapter – Recipe 3-4)

tfidf_vectorizer = TfidfVectorizer()

tfidf_matrix = tfidf_vectorizer.fit_transform(documents)

tfidf_matrix.shape

#output

(5, 10)

#compute similarity for first sentence with rest of the sentences

cosine_similarity(tfidf_matrix[0:1],tfidf_matrix)

#output

array([[ 1. , 0.17682765, 0.14284054, 0.13489366, 0.68374784]])

If we clearly observe, the first sentence and last sentence have higher similarity compared to the rest of the sentences.

Phonetic matching

The next version of similarity checking is phonetic matching, which roughly matches the two words or sentences and also creates an alphanumeric string as an encoded version of the text or word. It is very useful for searching large text corpora, correcting spelling errors, and matching relevant names. Soundex and Metaphone are two main phonetic algorithms used for this purpose. The simplest way to do this is by using the fuzzy library.

1.
Install and import the library
!pip install fuzzy
import fuzzy
2.
Run the Soundex function

soundex = fuzzy.Soundex(4)

3.
Generate the phonetic form
soundex('natural')
#output
'N364'
soundex('natuaral')
#output
'N364'
soundex('language')
#output
'L52'
soundex('processing')
#output
'P625'

Soundex is treating “natural” and “natuaral” as the same, and the phonetic code for both of the strings is “N364.” And for “language” and “processing,” it is “L52” and “P625” respectively.

Recipe 4-3. Tagging Part of Speech

Part of speech (POS) tagging is another crucial part of natural language processing that involves labeling the words with a part of speech such as noun, verb, adjective, etc. POS is the base for Named Entity Resolution, Sentiment Analysis, Question Answering, and Word Sense Disambiguation.

Problem

Tagging the parts of speech for a sentence.

Solution

There are 2 ways a tagger can be built.

Rule based - Rules created manually, which tag a word belonging to a particular POS.
Stochastic based - These algorithms capture the sequence of the words and tag the probability of the sequence using hidden Markov models.

How It Works

Again, NLTK has the best POS tagging module. nltk.pos_tag(word) is the function that will generate the POS tagging for any given word. Use for loop and generate POS for all the words present in the document.

Step 3-1 Store the text in a variable

Here is the variable:

Text = "I love NLP and I will learn NLP in 2 month"

Step 3-2 NLTK for POS

Now the code:

# Importing necessary packages and stopwords

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

stop_words = set(stopwords.words('english'))

# Tokenize the text

tokens = sent_tokenize(text)

#Generate tagging for all the tokens using loop

for i in tokens:

words = nltk.word_tokenize(i)

words = [w for w in words if not w in stop_words]

# POS-tagger.

tags = nltk.pos_tag(words)

Recipe 4-4. Extract Entities from Text

In this recipe, we are going to discuss how to identify and extract entities from the text, called Named Entity Recognition. There are multiple libraries to perform this task like NLTK chunker, StanfordNER, SpaCy, opennlp, and NeuroNER; and there are a lot of APIs also like WatsonNLU, AlchemyAPI, NERD, Google Cloud NLP API, and many more.

Problem

You want to identify and extract entities from the text.

Solution

The simplest way to do this is by using the ne_chunk from NLTK or SpaCy.

How It Works

Let’s follow the steps in this section to perform NER.

Step 4-1 Read/create the text data

Here is the text:

sent = "John is studying at Stanford University in California"

Step 4-2 Extract the entities

Execute the below code.

Using NLTK

#import libraries

import nltk

from nltk import ne_chunk

from nltk import word_tokenize

#NER

ne_chunk(nltk.pos_tag(word_tokenize(sent)), binary=False)

#output

Tree('S', [Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('studying', 'VBG'), ('at', 'IN'), Tree('ORGANIZATION', [('Stanford', 'NNP'), ('University', 'NNP')]), ('in', 'IN'), Tree('GPE', [('California', 'NNP')])])

Here "John" is tagged as "PERSON"

"Stanford" as "ORGANIZATION"

"California" as "GPE". Geopolitical entity, i.e. countries, cities, states.

Using SpaCy

import spacy

nlp = spacy.load('en')

# Read/create a sentence

doc = nlp(u'Apple is ready to launch new phone worth $10000 in New york time square ')

for ent in doc.ents:

print(ent.text, ent.start_char, ent.end_char, ent.label_)

#output

Apple 0 5 ORG

10000 42 47 MONEY

New york 51 59 GPE

According to the output, Apple is an organization, 10000 is money, and New York is place. The results are accurate and can be used for any NLP applications.

Recipe 4-5. Extracting Topics from Text

In this recipe, we are going to discuss how to identify topics from the document. Say, for example, there is an online library with multiple departments based on the kind of book. As the new book comes in, you want to look at the unique keywords/topics and decide on which department this book might belong to and place it accordingly. In these kinds of situations, topic modeling would be handy.

Basically, this is document tagging and clustering.

Problem

You want to extract or identify topics from the document.

Solution

The simplest way to do this by using the gensim library.

How It Works

Let’s follow the steps in this section to identify topics within documents using genism.

Step 5-1 Create the text data

Here is the text:

doc1 = "I am learning NLP, it is very interesting and exciting. it includes machine learning and deep learning"

doc2 = "My father is a data scientist and he is nlp expert"

doc3 = "My sister has good exposure into android development"

doc_complete = [doc1, doc2, doc3]

doc_complete

#output

['I am learning NLP, it is very interesting and exciting. it includes machine learning and deep learning',

'My father is a data scientist and he is nlp expert',

'My sister has good exposure into android development']

Step 5-2 Cleaning and preprocessing

Next, we clean it up:

# Install and import libraries

!pip install gensim

from nltk.corpus import stopwords

from nltk.stem.wordnet import WordNetLemmatizer

import string

# Text preprocessing as discussed in chapter 2

stop = set(stopwords.words('english'))

exclude = set(string.punctuation)

lemma = WordNetLemmatizer()

def clean(doc):

stop_free = " ".join([i for i in doc.lower().split() if i not in stop])

punc_free = ".join(ch for ch in stop_free if ch not in exclude)

normalized = " ".join(lemma.lemmatize(word) for word in punc_free.split())

return normalized

doc_clean = [clean(doc).split() for doc in doc_complete]

doc_clean

#output

[['learning',

'nlp',

'interesting',

'exciting',

'includes',

'machine',

'learning',

'deep',

'learning'],

['father', 'data', 'scientist', 'nlp', 'expert'],

['sister', 'good', 'exposure', 'android', 'development']]

Step 5-3 Preparing document term matrix

The code is below:

# Importing gensim

import gensim

from gensim import corpora

# Creating the term dictionary of our corpus, where every unique term is assigned an index.

dictionary = corpora.Dictionary(doc_clean)

# Converting a list of documents (corpus) into Document-Term Matrix using dictionary prepared above.

doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]

doc_term_matrix

#output

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 3), (5, 1), (6, 1)],

[(6, 1), (7, 1), (8, 1), (9, 1), (10, 1)],

[(11, 1), (12, 1), (13, 1), (14, 1), (15, 1)]]

Step 5-4 LDA model

The final part is to create the LDA model:

# Creating the object for LDA model using gensim library

Lda = gensim.models.ldamodel.LdaModel

# Running and Training LDA model on the document term matrix for 3 topics.

ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50)

# Results

print(ldamodel.print_topics())

#output

[(0, '0.063*"nlp" + 0.063*"father" + 0.063*"data" + 0.063*"scientist" + 0.063*"expert" + 0.063*"good" + 0.063*"exposure" + 0.063*"development" + 0.063*"android" + 0.063*"sister"'), (1, '0.232*"learning" + 0.093*"nlp" + 0.093*"deep" + 0.093*"includes" + 0.093*"interesting" + 0.093*"machine" + 0.093*"exciting" + 0.023*"scientist" + 0.023*"data" + 0.023*"father"'), (2, '0.087*"sister" + 0.087*"good" + 0.087*"exposure" + 0.087*"development" + 0.087*"android" + 0.087*"father" + 0.087*"scientist" + 0.087*"data" + 0.087*"expert" + 0.087*"nlp"')]

All the weights associated with the topics from the sentence seem almost similar. You can perform this on huge data to extract significant topics. The whole idea to implement this on sample data is to make you familiar with it, and you can use the same code snippet to perform on the huge data for significant results and insights.

Recipe 4-6. Classifying Text

Text classification – The aim of text classification is to automatically classify the text documents based on pretrained categories.

Applications:

Sentiment Analysis
Document classification
Spam – ham mail classification
Resume shortlisting
Document summarization

Problem

Spam - ham classification using machine learning.

Solution

If you observe, your Gmail has a folder called “Spam.” It will basically classify your emails into spam and ham so that you don’t have to read unnecessary emails.

How It Works

Let’s follow the step-by-step method to build the classifier.

Step 6-1 Data collection and understanding

Please download data from the below link and save it in your working directory:

https://www.kaggle.com/uciml/sms-spam-collection-dataset#spam.csv

#Read the data

Email_Data = pd.read_csv("spam.csv",encoding ='latin1')

#Data undestanding

Email_Data.columns

#output

Index(['v1', 'v2', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], dtype="object")

Email_Data = Email_Data[['v1', 'v2']]

Email_Data = Email_Data.rename(columns={"v1":"Target", "v2":"Email"})

Email_Data.head()

#output

Target Email

0 ham Go until jurong point, crazy.. Available only ...

1 ham Ok lar... Joking wif u oni...

2 spam Free entry in 2 a wkly comp to win FA Cup fina...

3 ham U dun say so early hor... U c already then say...

4 ham Nah I don't think he goes to usf, he lives aro...

Step 6-2 Text processing and feature engineering

The code is below:

#import

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import string

from nltk.stem import SnowballStemmer

from nltk.corpus import stopwords

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split

import os

from textblob import TextBlob

from nltk.stem import PorterStemmer

from textblob import Word

from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer

import sklearn.feature_extraction.text as text

from sklearn import model_selection, preprocessing, linear_model, naive_bayes, metrics, svm

#pre processing steps like lower case, stemming and lemmatization

Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join(x.lower() for x in x.split()))

stop = stopwords.words('english')

Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))

st = PorterStemmer()

Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join([st.stem(word) for word in x.split()]))

Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))

Email_Data.head()

#output

Target Email

0 ham go jurong point, crazy.. avail bugi n great wo...

1 ham ok lar... joke wif u oni...

2 spam free entri 2 wkli comp win fa cup final tkt 21...

3 ham u dun say earli hor... u c alreadi say...

4 ham nah think goe usf, live around though

#Splitting data into train and validation

train_x, valid_x, train_y, valid_y = model_selection.train_test_split(Email_Data['Email'], Email_Data['Target'])

# TFIDF feature generation for a maximum of 5000 features

encoder = preprocessing.LabelEncoder()

train_y = encoder.fit_transform(train_y)

valid_y = encoder.fit_transform(valid_y)

tfidf_vect = TfidfVectorizer(analyzer='word', token_pattern=r'w{1,}', max_features=5000)

tfidf_vect.fit(Email_Data['Email'])

xtrain_tfidf = tfidf_vect.transform(train_x)

xvalid_tfidf = tfidf_vect.transform(valid_x)

xtrain_tfidf.data

#output

array([0.39933971, 0.36719906, 0.60411187, ..., 0.36682939, 0.30602539, 0.38290119])

Step 6-3 Model training

This is the generalized function for training any given model:

def train_model(classifier, feature_vector_train, label, feature_vector_valid, is_neural_net=False):

# fit the training dataset on the classifier

classifier.fit(feature_vector_train, label)

# predict the labels on validation dataset

predictions = classifier.predict(feature_vector_valid)

return metrics.accuracy_score(predictions, valid_y)

# Naive Bayes trainig

accuracy = train_model(naive_bayes.MultinomialNB(alpha=0.2), xtrain_tfidf, train_y, xvalid_tfidf)

print ("Accuracy: ", accuracy)

#output

Accuracy: 0.985642498205

# Linear Classifier on Word Level TF IDF Vectors

accuracy = train_model(linear_model.LogisticRegression(), xtrain_tfidf, train_y, xvalid_tfidf)

print ("Accuracy: ", accuracy)

#output

Accuracy: 0.970567121321

Naive Bayes is giving better results than the linear classifier. We can try many more classifiers and then choose the best one.

Recipe 4-7. Carrying Out Sentiment Analysis

In this recipe, we are going to discuss how to understand the sentiment of a particular sentence or statement. Sentiment analysis is one of the widely used techniques across the industries to understand the sentiments of the customers/users around the products/services. Sentiment analysis gives the sentiment score of a sentence/statement tending toward positive or negative.

Problem

You want to do a sentiment analysis.

Solution

The simplest way to do this by using a TextBlob or vedar library.

How It Works

Let’s follow the steps in this section to do sentiment analysis using TextBlob. It will basically give 2 metrics.

Polarity = Polarity lies in the range of [-1,1] where 1 means a positive statement and -1 means a negative statement.
Subjectivity = Subjectivity refers that mostly it is a public opinion and not factual information [0,1].

Step 7-1 Create the sample data

Here is the sample data:

review = "I like this phone. screen quality and camera clarity is really good."

review2 = "This tv is not good. Bad quality, no clarity, worst experience"

Step 7-2 Cleaning and preprocessing

Refer to Chapter 2, Recipe 2-10, for this step.

Step 7-3 Get the sentiment scores

Using a pretrained model from TextBlob to get the sentiment scores:

#import libraries

from textblob import TextBlob

#TextBlob has a pre trained sentiment prediction model

blob = TextBlob(review)

blob.sentiment

#output

Sentiment(polarity=0.7, subjectivity=0.6000000000000001)

It seems like a very positive review.

#now lets look at the sentiment of review2

blob = TextBlob(review2)

blob.sentiment

#output

Sentiment(polarity=-0.6833333333333332, subjectivity=0.7555555555555555)

This is a negative review, as the polarity is “-0.68.”

Note: We will cover a one real-time use case on sentiment analysis with an end-to-end implementation in the next chapter, Recipe 5-2.

Recipe 4-8. Disambiguating Text

There is ambiguity that arises due to a different meaning of words in a different context.

For example,

Text1 = 'I went to the bank to deposit my money'

Text2 = 'The river bank was full of dead fishes'

In the above texts, the word “bank” has different meanings based on the context of the sentence.

Problem

Understanding disambiguating word sense.

Solution

The Lesk algorithm is one of the best algorithms for word sense disambiguation. Let’s see how to solve using the package pywsd and nltk.

How It Works

Below are the steps to achieve the results.

Step 8-1 Import libraries

First, import the libraries:

#Install pywsd

!pip install pywsd

#Import functions

from nltk.corpus import wordnet as wn

from nltk.stem import PorterStemmer

from itertools import chain

from pywsd.lesk import simple_lesk

Step 8-2 Disambiguating word sense

Now the code:

# Sentences

bank_sents = ['I went to the bank to deposit my money',

'The river bank was full of dead fishes']

# calling the lesk function and printing results for both the sentences

print ("Context-1:", bank_sents[0])

answer = simple_lesk(bank_sents[0],'bank')

print ("Sense:", answer)

print ("Definition : ", answer.definition())

print ("Context-2:", bank_sents[1])

answer = simple_lesk(bank_sents[1],'bank','n')

print ("Sense:", answer)

print ("Definition : ", answer.definition())

#Result:

Context-1: I went to the bank to deposit my money

Sense: Synset('depository_financial_institution.n.01')

Definition : a financial institution that accepts deposits and channels the money into lending activities

Context-2: The river bank was full of dead fishes

Sense: Synset('bank.n.01')

Definition : sloping land (especially the slope beside a body of water)

Observe that in context-1, “bank” is a financial institution, but in context-2, “bank” is sloping land.

Recipe 4-9. Converting Speech to Text

Converting speech to text is a very useful NLP technique.

Problem

You want to convert speech to text.

Solution

The simplest way to do this by using Speech Recognition and PyAudio.

How It Works

Let’s follow the steps in this section to implement speech to text.

Step 9-1 Understanding/defining business problem

Interaction with machines is trending toward the voice, which is the usual way of human communication. Popular examples are Siri, Alexa’s Google Voice, etc.

Step 9-2 Install and import necessary libraries

Here are the libraries:

!pip install SpeechRecognition

!pip install PyAudio

import speech_recognition as sr

Step 9-3 Run below code

Now after you run the below code snippet, whatever you say on the microphone (using recognize_google function) gets converted into text.

r=sr.Recognizer()

with sr.Microphone() as source:

print("Please say something")

audio = r.listen(source)

print("Time over, thanks")

try:

print("I think you said: "+r.recognize_google(audio));

except:

pass;

#output

Please say something

Time over, thanks

I think you said: I am learning natural language processing

This code works with the default language “English.” If you speak in any other language , for example Hindi, the text is interpreted in the form of English, like as below:

#code snippet

r=sr.Recognizer()

with sr.Microphone() as source:

print("Please say something")

audio = r.listen(source)

print("Time over, thanks")

try:

print("I think you said: "+r.recognize_google(audio));

except:

pass;

#output

Please say something

Time over, thanks

I think you said: aapka naam kya hai

If you want the text in the spoken language, please run the below code snippet. Where we have made the minor change is in the recognize_google –language(‘hi-IN’, which means Hindi).

#code snippet

r=sr.Recognizer()

with sr.Microphone() as source:

print("Please say something")

audio = r.listen(source)

print("Time over, thanks")

try:

print("I think you said: "+r.recognize_google(audio, language ='hi-IN'));

except sr.UnknownValueError:

print("Google Speech Recognition could not understand audio")

except sr.RequestError as e:

print("Could not request results from Google Speech Recognition service; {0}".format(e))

except:

pass;

Recipe 4-10. Converting Text to Speech

Converting text to speech is another useful NLP technique.

Problem

You want to convert text to speech.

Solution

The simplest way to do this by using the gTTs library.

How It Works

Let’s follow the steps in this section to implement text to speech.

Step 10-1 Install and import necessary libraries

Here are the libraries:

!pip install gTTS

from gtts import gTTS

Step 10-2 Run below code, gTTS function

Now after you run the below code snippet, whatever you input in the text parameter gets converted into audio.

#chooses the language, English('en')

convert = gTTS(text='I like this NLP book', lang="en", slow=False)

# Saving the converted audio in a mp3 file named

myobj.save("audio.mp3")

#output

Please play the audio.mp3 file saved in your local machine to hear the audio.

Recipe 4-11. Translating Speech

Language detection and translation .

Problem

Whenever you try to analyze data from blogs that are hosted across the globe, especially websites from countries like China, where Chinese is used predominantly, analyzing such data or performing NLP tasks on such data would be difficult. That’s where language translation comes to the rescue. You want to translate one language to another.

Solution

The easiest way to do this by using the goslate library.

How It Works

Let’s follow the steps in this section to implement language translation in Python.

Step 11-1 Install and import necessary libraries

Here are the libraries:

!pip install goslate

import goslate

Step 11-2 Input text

A simple phrase:

text = "Bonjour le monde"

Step 11-3 Run goslate function

The translation function:

gs = goslate.Goslate()

translatedText = gs.translate(text,'en')

print(translatedText)

#output

Hi world

Well, it feels accomplished, isn’t it? We have implemented so many advanced NLP applications and techniques. That is not all folks; we have a couple more interesting chapters ahead, where we will look at the industrial applications around NLP, their solution approach, and end-to-end implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Advanced Natural Language Processing

Create new playlist

Sign In

Sign Up

4. Advanced Natural Language Processing

Recipe 4-1. Extracting Noun Phrases

Problem

Solution

How It Works

Recipe 4-2. Finding Similarity Between Texts

Problem

Solution

How It Works

Step 2-1 Create/read the text data

Step 2-2 Find the similarity

Phonetic matching

Recipe 4-3. Tagging Part of Speech

Problem

Solution

How It Works

Step 3-1 Store the text in a variable

Step 3-2 NLTK for POS

Recipe 4-4. Extract Entities from Text

Problem

Solution

How It Works

Step 4-1 Read/create the text data

Step 4-2 Extract the entities

Using NLTK

Using SpaCy

Recipe 4-5. Extracting Topics from Text

Problem

Solution

How It Works

Step 5-1 Create the text data

Step 5-2 Cleaning and preprocessing

Step 5-3 Preparing document term matrix

Step 5-4 LDA model

Recipe 4-6. Classifying Text

Problem

Solution

How It Works

Step 6-1 Data collection and understanding

Step 6-2 Text processing and feature engineering

Step 6-3 Model training

Recipe 4-7. Carrying Out Sentiment Analysis

Problem

Solution

How It Works

Step 7-1 Create the sample data

Step 7-2 Cleaning and preprocessing

Step 7-3 Get the sentiment scores

Recipe 4-8. Disambiguating Text

Problem

Solution

How It Works

Step 8-1 Import libraries

Step 8-2 Disambiguating word sense

Recipe 4-9. Converting Speech to Text

Problem

Solution

How It Works

Step 9-1 Understanding/defining business problem

Step 9-2 Install and import necessary libraries

Step 9-3 Run below code

Recipe 4-10. Converting Text to Speech

Problem

Solution

How It Works

Step 10-1 Install and import necessary libraries

Step 10-2 Run below code, gTTS function

Recipe 4-11. Translating Speech

Problem

Solution

How It Works

Step 11-1 Install and import necessary libraries

Step 11-2 Input text

Step 11-3 Run goslate function

Table of Contents for
4. Advanced Natural Language Processing