© Akshay Kulkarni and Adarsha Shivananda 2019
Akshay Kulkarni and Adarsha ShivanandaNatural Language Processing Recipeshttps://doi.org/10.1007/978-1-4842-4267-4_4

4. Advanced Natural Language Processing

Akshay Kulkarni1  and Adarsha Shivananda1
(1)
Bangalore, Karnataka, India
 
In this chapter, we are going to cover various advanced NLP techniques and leverage machine learning algorithms to extract information from text data as well as some of the advanced NLP applications with the solution approach and implementation.
  • Recipe 1. Noun Phrase extraction

  • Recipe 2. Text similarity

  • Recipe 3. Parts of speech tagging

  • Recipe 4. Information extraction – NER – Entity recognition

  • Recipe 5. Topic modeling

  • Recipe 6. Text classification

  • Recipe 7. Sentiment analysis

  • Recipe 8. Word sense disambiguation

  • Recipe 9. Speech recognition and speech to text

  • Recipe 10. Text to speech

  • Recipe 11. Language detection and translation

Before getting into recipes, let’s understand the NLP pipeline and life cycle first. There are so many concepts we are implementing in this book, and we might get overwhelmed by the content of it. To make it simpler and smoother, let’s see what is the flow that we need to follow for an NLP solution.

For example, let’s consider customer sentiment analysis and prediction for a product or brand or service.
  • Define the Problem : Understand the customer sentiment across the products.

  • Understand the depth and breadth of the problem : Understand the customer/user sentiments across the product; why we are doing this? What is the business impact? Etc.

  • Data requirement brainstorming : Have a brainstorming activity to list out all possible data points.
    • All the reviews from customers on e-commerce platforms like Amazon, Flipkart, etc.

    • Emails sent by customers

    • Warranty claim forms

    • Survey data

    • Call center conversations using speech to text

    • Feedback forms

    • Social media data like Twitter, Facebook, and LinkedIn

  • Data collection : We learned different techniques to collect the data in Chapter 1. Based on the data and the problem, we might have to incorporate different data collection methods. In this case, we can use web scraping and Twitter APIs.

  • Text Preprocessing : We know that data won’t always be clean. We need to spend a significant amount of time to process it and extract insight out of it using different methods that we discussed earlier in Chapter 2.

  • Text to feature : As we discussed, texts are characters and machines will have a tough time understanding them. We have to convert them to features that machines and algorithms can understand using any of the methods we learned in the previous chapter.

  • Machine learning/Deep learning : Machine learning/Deep learning is a part of an artificial intelligence umbrella that will make systems automatically learn patterns in the data without being programmed. Most of the NLP solutions are based on this, and since we converted text to features, we can leverage machine learning or deep learning algorithms to achieve the goals like text classification, natural language generation, etc.

  • Insights and deployment : There is absolutely no use for building NLP solutions without proper insights being communicated to the business. Always take time to connect the dots between model/analysis output and the business, thereby creating the maximum impact.

Recipe 4-1. Extracting Noun Phrases

In this recipe, let us extract a noun phrase from the text data (a sentence or the documents).

Problem

You want to extract a noun phrase.

Solution

Noun Phrase extraction is important when you want to analyze the “who” in a sentence. Let’s see an example below using TextBlob.

How It Works

Execute the below code to extract noun phrases.
#Import libraries
import nltk
from textblob import TextBlob
#Extract noun
blob = TextBlob("John is learning natural language processing")
for np in blob.noun_phrases:
    print(np)
Output:
john
natural language processing

Recipe 4-2. Finding Similarity Between Texts

In this recipe, we are going to discuss how to find the similarity between two documents or text. There are many similarity metrics like Euclidian, cosine, Jaccard, etc. Applications of text similarity can be found in areas like spelling correction and data deduplication.

Here are a few of the similarity measures:
  • Cosine similarity : Calculates the cosine of the angle between the two vectors.

  • Jaccard similarity : The score is calculated using the intersection or union of words.

  • Jaccard Index = (the number in both sets) / (the number in either set) * 100.

  • Levenshtein distance : Minimal number of insertions, deletions, and replacements required for transforming string “a” into string “b.”

  • Hamming distance : Number of positions with the same symbol in both strings. But it can be defined only for strings with equal length.

Problem

You want to find the similarity between texts/documents.

Solution

The simplest way to do this is by using cosine similarity from the sklearn library.

How It Works

Let’s follow the steps in this section to compute the similarity score between text documents.

Step 2-1 Create/read the text data

Here is the data:
documents = (
"I like NLP",
"I am exploring NLP",
"I am a beginner in NLP",
"I want to learn NLP",
"I like advanced NLP"
)

Step 2-2 Find the similarity

Execute the below code to find the similarity.
#Import libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
#Compute tfidf : feature engineering(refer previous chapter – Recipe 3-4)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
tfidf_matrix.shape
#output
(5, 10)
#compute similarity for first sentence with rest of the sentences
cosine_similarity(tfidf_matrix[0:1],tfidf_matrix)
#output
array([[ 1.       ,  0.17682765,  0.14284054,  0.13489366,  0.68374784]])

If we clearly observe, the first sentence and last sentence have higher similarity compared to the rest of the sentences.

Phonetic matching

The next version of similarity checking is phonetic matching, which roughly matches the two words or sentences and also creates an alphanumeric string as an encoded version of the text or word. It is very useful for searching large text corpora, correcting spelling errors, and matching relevant names. Soundex and Metaphone are two main phonetic algorithms used for this purpose. The simplest way to do this is by using the fuzzy library.
  1. 1.
    Install and import the library
    !pip install fuzzy
    import fuzzy
     
  2. 2.

    Run the Soundex function

     
soundex = fuzzy.Soundex(4)
  1. 3.
    Generate the phonetic form
    soundex('natural')
    #output
    'N364'
    soundex('natuaral')
    #output
    'N364'
    soundex('language')
    #output
    'L52'
    soundex('processing')
    #output
    'P625'
     

Soundex is treating “natural” and “natuaral” as the same, and the phonetic code for both of the strings is “N364.” And for “language” and “processing,” it is “L52” and “P625” respectively.

Recipe 4-3. Tagging Part of Speech

Part of speech (POS) tagging is another crucial part of natural language processing that involves labeling the words with a part of speech such as noun, verb, adjective, etc. POS is the base for Named Entity Resolution, Sentiment Analysis, Question Answering, and Word Sense Disambiguation.

Problem

Tagging the parts of speech for a sentence.

Solution

There are 2 ways a tagger can be built.
  • Rule based - Rules created manually, which tag a word belonging to a particular POS.

  • Stochastic based - These algorithms capture the sequence of the words and tag the probability of the sequence using hidden Markov models.

How It Works

Again, NLTK has the best POS tagging module. nltk.pos_tag(word) is the function that will generate the POS tagging for any given word. Use for loop and generate POS for all the words present in the document.

Step 3-1 Store the text in a variable

Here is the variable:
Text  =  "I love NLP and I will learn NLP in 2 month"

Step 3-2 NLTK for POS

Now the code:
# Importing necessary packages and stopwords
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))
# Tokenize the text
tokens = sent_tokenize(text)
#Generate tagging for all the tokens using loop
for i in tokens:
    words = nltk.word_tokenize(i)
    words = [w for w in words if not w in stop_words]
    #  POS-tagger.
    tags = nltk.pos_tag(words)
tags
Results:
[('I', 'PRP'),
 ('love', 'VBP'),
 ('NLP', 'NNP'),
 ('I', 'PRP'),
 ('learn', 'VBP'),
 ('NLP', 'RB'),
 ('2month', 'CD')]
Below are the short forms and explanation of POS tagging. The word “love” is VBP, which means verb, sing. present, non-3d take.
  • CC coordinating conjunction

  • CD cardinal digit

  • DT determiner

  • EX existential there (like: “there is” ... think of it like “there exists”)

  • FW foreign word

  • IN preposition/subordinating conjunction

  • JJ adjective ‘big’

  • JJR adjective, comparative ‘bigger’

  • JJS adjective, superlative ‘biggest’

  • LS list marker 1)

  • MD modal could, will

  • NN noun, singular ‘desk’

  • NNS noun plural ‘desks’

  • NNP proper noun, singular ‘Harrison’

  • NNPS proper noun, plural ‘Americans’

  • PDT predeterminer ‘all the kids’

  • POS possessive ending parent’s

  • PRP personal pronoun I, he, she

  • PRP$ possessive pronoun my, his, hers

  • RB adverb very, silently

  • RBR adverb, comparative better

  • RBS adverb, superlative best

  • RP particle give up

  • TO to go ‘to’ the store

  • UH interjection

  • VB verb, base form take

  • VBD verb, past tense took

  • VBG verb, gerund/present participle taking

  • VBN verb, past participle taken

  • VBP verb, sing. present, non-3d take

  • VBZ verb, 3rd person sing. present takes

  • WDT wh-determiner which

  • WP wh-pronoun who, what

  • WP$ possessive wh-pronoun whose

  • WRB wh-adverb where, when

Recipe 4-4. Extract Entities from Text

In this recipe, we are going to discuss how to identify and extract entities from the text, called Named Entity Recognition. There are multiple libraries to perform this task like NLTK chunker, StanfordNER, SpaCy, opennlp, and NeuroNER; and there are a lot of APIs also like WatsonNLU, AlchemyAPI, NERD, Google Cloud NLP API, and many more.

Problem

You want to identify and extract entities from the text.

Solution

The simplest way to do this is by using the ne_chunk from NLTK or SpaCy.

How It Works

Let’s follow the steps in this section to perform NER.

Step 4-1 Read/create the text data

Here is the text:
sent = "John is studying at Stanford University in California"

Step 4-2 Extract the entities

Execute the below code.

Using NLTK
#import libraries
import nltk
from nltk import ne_chunk
from nltk import word_tokenize
#NER
ne_chunk(nltk.pos_tag(word_tokenize(sent)), binary=False)
#output
Tree('S', [Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('studying', 'VBG'), ('at', 'IN'), Tree('ORGANIZATION', [('Stanford', 'NNP'), ('University', 'NNP')]), ('in', 'IN'), Tree('GPE', [('California', 'NNP')])])
Here "John" is tagged as "PERSON"
"Stanford" as "ORGANIZATION"
"California" as "GPE". Geopolitical entity, i.e. countries, cities, states.
Using SpaCy
import spacy
nlp = spacy.load('en')
# Read/create a sentence
doc = nlp(u'Apple is ready to launch new phone worth $10000 in New york time square ')
for ent in doc.ents:
   print(ent.text, ent.start_char, ent.end_char, ent.label_)
#output
Apple 0 5 ORG
10000 42 47 MONEY
New york 51 59 GPE

According to the output, Apple is an organization, 10000 is money, and New York is place. The results are accurate and can be used for any NLP applications.

Recipe 4-5. Extracting Topics from Text

In this recipe, we are going to discuss how to identify topics from the document. Say, for example, there is an online library with multiple departments based on the kind of book. As the new book comes in, you want to look at the unique keywords/topics and decide on which department this book might belong to and place it accordingly. In these kinds of situations, topic modeling would be handy.

Basically, this is document tagging and clustering.

Problem

You want to extract or identify topics from the document.

Solution

The simplest way to do this by using the gensim library.

How It Works

Let’s follow the steps in this section to identify topics within documents using genism.

Step 5-1 Create the text data

Here is the text:
doc1 = "I am learning NLP, it is very interesting and exciting. it includes machine learning and deep learning"
doc2 = "My father is a data scientist and he is nlp expert"
doc3 = "My sister has good exposure into android development"
doc_complete = [doc1, doc2, doc3]
doc_complete
#output
['I am learning NLP, it is very interesting and exciting. it includes machine learning and deep learning',
 'My father is a data scientist and he is nlp expert',
 'My sister has good exposure into android development']

Step 5-2 Cleaning and preprocessing

Next, we clean it up:
# Install and import libraries
!pip install gensim
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
import string
# Text preprocessing as discussed in chapter 2
stop = set(stopwords.words('english'))
exclude = set(string.punctuation)
lemma = WordNetLemmatizer()
def clean(doc):
    stop_free = " ".join([i for i in doc.lower().split() if i not in stop])
    punc_free = ".join(ch for ch in stop_free if ch not in exclude)
    normalized = " ".join(lemma.lemmatize(word) for word in punc_free.split())
    return normalized
doc_clean = [clean(doc).split() for doc in doc_complete]
doc_clean
#output
[['learning',
  'nlp',
  'interesting',
  'exciting',
  'includes',
  'machine',
  'learning',
  'deep',
  'learning'],
 ['father', 'data', 'scientist', 'nlp', 'expert'],
 ['sister', 'good', 'exposure', 'android', 'development']]

Step 5-3 Preparing document term matrix

The code is below:
# Importing gensim
import gensim
from gensim import corpora
# Creating the term dictionary of our corpus, where every unique term is assigned an index.
dictionary = corpora.Dictionary(doc_clean)
# Converting a list of documents (corpus) into Document-Term Matrix using dictionary prepared above.
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
doc_term_matrix
#output
[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 3), (5, 1), (6, 1)],
 [(6, 1), (7, 1), (8, 1), (9, 1), (10, 1)],
 [(11, 1), (12, 1), (13, 1), (14, 1), (15, 1)]]

Step 5-4 LDA model

The final part is to create the LDA model:
# Creating the object for LDA model using gensim library
Lda = gensim.models.ldamodel.LdaModel
# Running and Training LDA model on the document term matrix for 3 topics.
ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50)
# Results
print(ldamodel.print_topics())
#output
[(0, '0.063*"nlp" + 0.063*"father" + 0.063*"data" + 0.063*"scientist" + 0.063*"expert" + 0.063*"good" + 0.063*"exposure" + 0.063*"development" + 0.063*"android" + 0.063*"sister"'), (1, '0.232*"learning" + 0.093*"nlp" + 0.093*"deep" + 0.093*"includes" + 0.093*"interesting" + 0.093*"machine" + 0.093*"exciting" + 0.023*"scientist" + 0.023*"data" + 0.023*"father"'), (2, '0.087*"sister" + 0.087*"good" + 0.087*"exposure" + 0.087*"development" + 0.087*"android" + 0.087*"father" + 0.087*"scientist" + 0.087*"data" + 0.087*"expert" + 0.087*"nlp"')]

All the weights associated with the topics from the sentence seem almost similar. You can perform this on huge data to extract significant topics. The whole idea to implement this on sample data is to make you familiar with it, and you can use the same code snippet to perform on the huge data for significant results and insights.

Recipe 4-6. Classifying Text

Text classification – The aim of text classification is to automatically classify the text documents based on pretrained categories.

Applications:
  • Sentiment Analysis

  • Document classification

  • Spam – ham mail classification

  • Resume shortlisting

  • Document summarization

Problem

Spam - ham classification using machine learning.

Solution

If you observe, your Gmail has a folder called “Spam.” It will basically classify your emails into spam and ham so that you don’t have to read unnecessary emails.

How It Works

Let’s follow the step-by-step method to build the classifier.

Step 6-1 Data collection and understanding

Please download data from the below link and save it in your working directory:

https://www.kaggle.com/uciml/sms-spam-collection-dataset#spam.csv
#Read the data
Email_Data = pd.read_csv("spam.csv",encoding ='latin1')
#Data undestanding
Email_Data.columns
#output
Index(['v1', 'v2', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], dtype="object")
Email_Data = Email_Data[['v1', 'v2']]
Email_Data = Email_Data.rename(columns={"v1":"Target", "v2":"Email"})
Email_Data.head()
#output
    Target   Email
0      ham   Go until jurong point, crazy.. Available only ...
1      ham   Ok lar... Joking wif u oni...
2      spam  Free entry in 2 a wkly comp to win FA Cup fina...
3      ham   U dun say so early hor... U c already then say...
4      ham   Nah I don't think he goes to usf, he lives aro...

Step 6-2 Text processing and feature engineering

The code is below:
#import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import string
from nltk.stem import SnowballStemmer
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
import os
from textblob import TextBlob
from nltk.stem import PorterStemmer
from textblob import Word
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
import sklearn.feature_extraction.text as text
from sklearn import model_selection, preprocessing, linear_model, naive_bayes, metrics, svm
#pre processing steps like lower case, stemming and lemmatization
Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join(x.lower() for x in x.split()))
stop = stopwords.words('english')
Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
st = PorterStemmer()
Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join([st.stem(word) for word in x.split()]))
Email_Data['Email'] = Email_Data['Email'].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
Email_Data.head()
#output
  Target                                              Email
0    ham  go jurong point, crazy.. avail bugi n great wo...
1    ham                        ok lar... joke wif u oni...
2    spam free entri 2 wkli comp win fa cup final tkt 21...
3    ham          u dun say earli hor... u c alreadi say...
4    ham              nah think goe usf, live around though
#Splitting data into train and validation
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(Email_Data['Email'], Email_Data['Target'])
# TFIDF feature generation for a maximum of 5000 features
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)
tfidf_vect = TfidfVectorizer(analyzer='word', token_pattern=r'w{1,}', max_features=5000)
tfidf_vect.fit(Email_Data['Email'])
xtrain_tfidf =  tfidf_vect.transform(train_x)
xvalid_tfidf =  tfidf_vect.transform(valid_x)
xtrain_tfidf.data
#output
array([0.39933971, 0.36719906, 0.60411187, ..., 0.36682939, 0.30602539, 0.38290119])

Step 6-3 Model training

This is the generalized function for training any given model:
def train_model(classifier, feature_vector_train, label, feature_vector_valid, is_neural_net=False):
    # fit the training dataset on the classifier
    classifier.fit(feature_vector_train, label)
    # predict the labels on validation dataset
    predictions = classifier.predict(feature_vector_valid)
    return metrics.accuracy_score(predictions, valid_y)
# Naive Bayes trainig
accuracy = train_model(naive_bayes.MultinomialNB(alpha=0.2), xtrain_tfidf, train_y, xvalid_tfidf)
print ("Accuracy: ", accuracy)
#output
Accuracy:  0.985642498205
# Linear Classifier on Word Level TF IDF Vectors
accuracy = train_model(linear_model.LogisticRegression(), xtrain_tfidf, train_y, xvalid_tfidf)
print ("Accuracy: ", accuracy)
#output
Accuracy:  0.970567121321

Naive Bayes is giving better results than the linear classifier. We can try many more classifiers and then choose the best one.

Recipe 4-7. Carrying Out Sentiment Analysis

In this recipe, we are going to discuss how to understand the sentiment of a particular sentence or statement. Sentiment analysis is one of the widely used techniques across the industries to understand the sentiments of the customers/users around the products/services. Sentiment analysis gives the sentiment score of a sentence/statement tending toward positive or negative.

Problem

You want to do a sentiment analysis.

Solution

The simplest way to do this by using a TextBlob or vedar library.

How It Works

Let’s follow the steps in this section to do sentiment analysis using TextBlob. It will basically give 2 metrics.
  • Polarity = Polarity lies in the range of [-1,1] where 1 means a positive statement and -1 means a negative statement.

  • Subjectivity = Subjectivity refers that mostly it is a public opinion and not factual information [0,1].

Step 7-1 Create the sample data

Here is the sample data:
review = "I like this phone. screen quality and camera clarity is really good."
review2 = "This tv is not good. Bad quality, no clarity, worst experience"

Step 7-2 Cleaning and preprocessing

Refer to Chapter 2, Recipe 2-10, for this step.

Step 7-3 Get the sentiment scores

Using a pretrained model from TextBlob to get the sentiment scores:
#import libraries
from textblob import TextBlob
#TextBlob has a pre trained sentiment prediction model
blob = TextBlob(review)
blob.sentiment
#output
Sentiment(polarity=0.7, subjectivity=0.6000000000000001)
It seems like a very positive review.
#now lets look at the sentiment of review2
blob = TextBlob(review2)
blob.sentiment
#output
Sentiment(polarity=-0.6833333333333332, subjectivity=0.7555555555555555)

This is a negative review, as the polarity is “-0.68.”

Note: We will cover a one real-time use case on sentiment analysis with an end-to-end implementation in the next chapter, Recipe 5-2.

Recipe 4-8. Disambiguating Text

There is ambiguity that arises due to a different meaning of words in a different context.

For example,
Text1 = 'I went to the bank to deposit my money'
Text2 = 'The river bank was full of dead fishes'

In the above texts, the word “bank” has different meanings based on the context of the sentence.

Problem

Understanding disambiguating word sense.

Solution

The Lesk algorithm is one of the best algorithms for word sense disambiguation. Let’s see how to solve using the package pywsd and nltk.

How It Works

Below are the steps to achieve the results.

Step 8-1 Import libraries

First, import the libraries:
#Install pywsd
!pip install pywsd
#Import functions
from nltk.corpus import wordnet as wn
from nltk.stem import PorterStemmer
from itertools import chain
from pywsd.lesk import simple_lesk

Step 8-2 Disambiguating word sense

Now the code:
# Sentences
bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']
# calling the lesk function and printing results for both the sentences
print ("Context-1:", bank_sents[0])
answer = simple_lesk(bank_sents[0],'bank')
print ("Sense:", answer)
print ("Definition : ", answer.definition())
print ("Context-2:", bank_sents[1])
answer = simple_lesk(bank_sents[1],'bank','n')
print ("Sense:", answer)
print ("Definition : ", answer.definition())
#Result:
Context-1: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition :  a financial institution that accepts deposits and channels the money into lending activities
Context-2: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition :  sloping land (especially the slope beside a body of water)

Observe that in context-1, “bank” is a financial institution, but in context-2, “bank” is sloping land.

Recipe 4-9. Converting Speech to Text

Converting speech to text is a very useful NLP technique.

Problem

You want to convert speech to text.

Solution

The simplest way to do this by using Speech Recognition and PyAudio.

How It Works

Let’s follow the steps in this section to implement speech to text.

Step 9-1 Understanding/defining business problem

Interaction with machines is trending toward the voice, which is the usual way of human communication. Popular examples are Siri, Alexa’s Google Voice, etc.

Step 9-2 Install and import necessary libraries

Here are the libraries:
!pip install SpeechRecognition
!pip install PyAudio
import speech_recognition as sr

Step 9-3 Run below code

Now after you run the below code snippet, whatever you say on the microphone (using recognize_google function) gets converted into text.
r=sr.Recognizer()
with sr.Microphone() as source:
    print("Please say something")
    audio = r.listen(source)
    print("Time over, thanks")
try:
    print("I think you said: "+r.recognize_google(audio));
except:
    pass;
#output
Please say something
Time over, thanks
I think you said: I am learning natural language processing
This code works with the default language “English.” If you speak in any other language , for example Hindi, the text is interpreted in the form of English, like as below:
#code snippet
r=sr.Recognizer()
with sr.Microphone() as source:
    print("Please say something")
    audio = r.listen(source)
    print("Time over, thanks")
try:
    print("I think you said: "+r.recognize_google(audio));
except:
    pass;
#output
Please say something
Time over, thanks
I think you said: aapka naam kya hai
If you want the text in the spoken language, please run the below code snippet. Where we have made the minor change is in the recognize_google –language(‘hi-IN’, which means Hindi).
#code snippet
r=sr.Recognizer()
with sr.Microphone() as source:
    print("Please say something")
    audio = r.listen(source)
    print("Time over, thanks")
try:
    print("I think you said: "+r.recognize_google(audio, language ='hi-IN'));
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))
except:
    pass;

Recipe 4-10. Converting Text to Speech

Converting text to speech is another useful NLP technique.

Problem

You want to convert text to speech.

Solution

The simplest way to do this by using the gTTs library.

How It Works

Let’s follow the steps in this section to implement text to speech.

Step 10-1 Install and import necessary libraries

Here are the libraries:
!pip install gTTS
from gtts import gTTS

Step 10-2 Run below code, gTTS function

Now after you run the below code snippet, whatever you input in the text parameter gets converted into audio.
#chooses the language, English('en')
convert = gTTS(text='I like this NLP book', lang="en", slow=False)
# Saving the converted audio in a mp3 file named
myobj.save("audio.mp3")
#output
Please play the audio.mp3 file saved in your local machine to hear the audio.

Recipe 4-11. Translating Speech

Language detection and translation .

Problem

Whenever you try to analyze data from blogs that are hosted across the globe, especially websites from countries like China, where Chinese is used predominantly, analyzing such data or performing NLP tasks on such data would be difficult. That’s where language translation comes to the rescue. You want to translate one language to another.

Solution

The easiest way to do this by using the goslate library.

How It Works

Let’s follow the steps in this section to implement language translation in Python.

Step 11-1 Install and import necessary libraries

Here are the libraries:
!pip install goslate
import goslate

Step 11-2 Input text

A simple phrase:
text = "Bonjour le monde"

Step 11-3 Run goslate function

The translation function:
gs = goslate.Goslate()
translatedText = gs.translate(text,'en')
print(translatedText)
#output
Hi world

Well, it feels accomplished, isn’t it? We have implemented so many advanced NLP applications and techniques. That is not all folks; we have a couple more interesting chapters ahead, where we will look at the industrial applications around NLP, their solution approach, and end-to-end implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.233.135