Home Page Icon
Home Page
Table of Contents for
cover
Close
cover
by Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar,
The Natural Language Processing Workshop
The Natural Language Processing Workshop
Preface
About the Book
Audience
About the Chapters
Conventions
Code Presentation
Setting up Your Environment
Installation and Setup
Installing the Required Libraries
Installing Libraries
Accessing the Code Files
1. Introduction to Natural Language Processing
Introduction
History of NLP
Text Analytics and NLP
Exercise 1.01: Basic Text Analytics
Various Steps in NLP
Tokenization
Exercise 1.02: Tokenization of a Simple Sentence
PoS Tagging
Exercise 1.03: PoS Tagging
Stop Word Removal
Exercise 1.04: Stop Word Removal
Text Normalization
Exercise 1.05: Text Normalization
Spelling Correction
Exercise 1.06: Spelling Correction of a Word and a Sentence
Stemming
Exercise 1.07: Using Stemming
Lemmatization
Exercise 1.08: Extracting the Base Word Using Lemmatization
Named Entity Recognition (NER)
Exercise 1.09: Treating Named Entities
Word Sense Disambiguation
Exercise 1.10: Word Sense Disambiguation
Sentence Boundary Detection
Exercise 1.11: Sentence Boundary Detection
Activity 1.01: Preprocessing of Raw Text
Kick Starting an NLP Project
Data Collection
Data Preprocessing
Feature Extraction
Model Development
Model Assessment
Model Deployment
Summary
2. Feature Extraction Methods
Introduction
Types of Data
Categorizing Data Based on Structure
Categorizing Data Based on Content
Cleaning Text Data
Tokenization
Exercise 2.01: Text Cleaning and Tokenization
Exercise 2.02: Extracting n-grams
Exercise 2.03: Tokenizing Text with Keras and TextBlob
Types of Tokenizers
Exercise 2.04: Tokenizing Text Using Various Tokenizers
Stemming
RegexpStemmer
Exercise 2.05: Converting Words in the Present Continuous Tense into Base Words with RegexpStemmer
The Porter Stemmer
Exercise 2.06: Using the Porter Stemmer
Lemmatization
Exercise 2.07: Performing Lemmatization
Exercise 2.08: Singularizing and Pluralizing Words
Language Translation
Exercise 2.09: Language Translation
Stop-Word Removal
Exercise 2.10: Removing Stop Words from Text
Activity 2.01: Extracting Top Keywords from the News Article
Feature Extraction from Texts
Extracting General Features from Raw Text
Exercise 2.11: Extracting General Features from Raw Text
Exercise 2.12: Extracting General Features from Text
Bag of Words (BoW)
Exercise 2.13: Creating a Bag of Words
Zipf's Law
Exercise 2.14: Zipf's Law
Term Frequency–Inverse Document Frequency (TFIDF)
Exercise 2.15: TFIDF Representation
Finding Text Similarity – Application of Feature Extraction
Exercise 2.16: Calculating Text Similarity Using Jaccard and Cosine Similarity
Word Sense Disambiguation Using the Lesk Algorithm
Exercise 2.17: Implementing the Lesk Algorithm Using String Similarity and Text Vectorization
Word Clouds
Exercise 2.18: Generating Word Clouds
Other Visualizations
Exercise 2.19: Other Visualizations Dependency Parse Trees and Named Entities
Activity 2.02: Text Visualization
Summary
3. Developing a Text Classifier
Introduction
Machine Learning
Unsupervised Learning
Hierarchical Clustering
Exercise 3.01: Performing Hierarchical Clustering
k-means Clustering
Exercise 3.02: Implementing k-means Clustering
Supervised Learning
Classification
Logistic Regression
Exercise 3.03: Text Classification – Logistic Regression
Naive Bayes Classifiers
Exercise 3.04: Text Classification – Naive Bayes
k-nearest Neighbors
Exercise 3.05: Text Classification Using the k-nearest Neighbors Method
Regression
Linear Regression
Exercise 3.06: Regression Analysis Using Textual Data
Tree Methods
Exercise 3.07: Tree-Based Methods – Decision Tree
Random Forest
Gradient Boosting Machine and Extreme Gradient Boost
Exercise 3.08: Tree-Based Methods – Random Forest
Exercise 3.09: Tree-Based Methods – XGBoost
Sampling
Exercise 3.10: Sampling (Simple Random, Stratified, and Multi-Stage)
Developing a Text Classifier
Feature Extraction
Feature Engineering
Removing Correlated Features
Exercise 3.11: Removing Highly Correlated Features (Tokens)
Dimensionality Reduction
Exercise 3.12: Performing Dimensionality Reduction Using Principal Component Analysis
Deciding on a Model Type
Evaluating the Performance of a Model
Exercise 3.13: Calculating the RMSE and MAPE of a Dataset
Activity 3.01: Developing End-to-End Text Classifiers
Building Pipelines for NLP Projects
Exercise 3.14: Building the Pipeline for an NLP Project
Saving and Loading Models
Exercise 3.15: Saving and Loading Models
Summary
4. Collecting Text Data with Web Scraping and APIs
Introduction
Collecting Data by Scraping Web Pages
Exercise 4.01: Extraction of Tag-Based Information from HTML Files
Requesting Content from Web Pages
Exercise 4.02: Collecting Online Text Data
Exercise 4.03: Analyzing the Content of Jupyter Notebooks (in HTML Format)
Activity 4.01: Extracting Information from an Online HTML Page
Activity 4.02: Extracting and Analyzing Data Using Regular Expressions
Dealing with Semi-Structured Data
JSON
Exercise 4.04: Working with JSON Files
XML
Exercise 4.05: Working with an XML File
Using APIs to Retrieve Real-Time Data
Exercise 4.06: Collecting Data Using APIs
Extracting data from Twitter Using the OAuth API
Activity 4.03: Extracting Data from Twitter
Summary
5. Topic Modeling
Introduction
Topic Discovery
Exploratory Data Analysis
Transforming Unstructured Data to Structured Data
Bag of Words
Topic-Modeling Algorithms
Latent Semantic Analysis (LSA)
LSA – How It Works
Key Input Parameters for LSA Topic Modeling
Exercise 5.01: Analyzing Wikipedia World Cup Articles with Latent Semantic Analysis
Dirichlet Process and Dirichlet Distribution
Latent Dirichlet Allocation (LDA)
LDA – How It Works
Measuring the Predictive Power of a Generative Topic Model
Exercise 5.02: Finding Topics in Canadian Open Data Inventory Using the LDA Model
Activity 5.01: Topic-Modeling Jeopardy Questions
Hierarchical Dirichlet Process (HDP)
Exercise 5.03: Topics in Around the World in Eighty Days
Exercise 5.04: Topics in The Life and Adventures of Robinson Crusoe by Daniel Defoe
Practical Challenges
State-of-the-Art Topic Modeling
Activity 5.02: Comparing Different Topic Models
Summary
6. Vector Representation
Introduction
What Is a Vector?
Frequency-Based Embeddings
Exercise 6.01: Word-Level One-Hot Encoding
Character-Level One-Hot Encoding
Exercise 6.02: Character One-Hot Encoding – Manual
Exercise 6.03: Character-Level One-Hot Encoding with Keras
Learned Word Embeddings
Word2Vec
Exercise 6.04: Training Word Vectors
Using Pre-Trained Word Vectors
Exercise 6.05: Using Pre-Trained Word Vectors
Document Vectors
Uses of Document Vectors
Exercise 6.06: Converting News Headlines to Document Vectors
Activity 6.01: Finding Similar News Article Using Document Vectors
Summary
7. Text Generation and Summarization
Introduction
Generating Text with Markov Chains
Markov Chains
Exercise 7.01: Text Generation Using a Random Walk over a Markov Chain
Text Summarization
TextRank
Key Input Parameters for TextRank
Exercise 7.02: Performing Summarization Using TextRank
Exercise 7.03: Summarizing a Children's Fairy Tale Using TextRank
Activity 7.01: Summarizing Complaints in the Consumer Financial Protection Bureau Dataset
Recent Developments in Text Generation and Summarization
Practical Challenges in Extractive Summarization
Summary
8. Sentiment Analysis
Introduction
Why Is Sentiment Analysis Required?
The Growth of Sentiment Analysis
The Monetization of Emotion
Types of Sentiments
Emotion
Key Ideas and Terms
Applications of Sentiment Analysis
Tools Used for Sentiment Analysis
NLP Services from Major Cloud Providers
Online Marketplaces
Python NLP Libraries
Deep Learning Frameworks
The textblob library
Exercise 8.01: Basic Sentiment Analysis Using the textblob Library
Activity 8.01: Tweet Sentiment Analysis Using the textblob library
Understanding Data for Sentiment Analysis
Exercise 8.02: Loading Data for Sentiment Analysis
Training Sentiment Models
Activity 8.02: Training a Sentiment Model Using TFIDF and Logistic Regression
Summary
Appendix
1. Introduction to Natural Language Processing
Activity 1.01: Preprocessing of Raw Text
2. Feature Extraction Methods
Activity 2.01: Extracting Top Keywords from the News Article
Activity 2.02: Text Visualization
3. Developing a Text Classifier
Activity 3.01: Developing End-to-End Text Classifiers
4. Collecting Text Data with Web Scraping and APIs
Activity 4.01: Extracting Information from an Online HTML Page
Activity 4.02: Extracting and Analyzing Data Using Regular Expressions
Activity 4.03: Extracting Data from Twitter
5. Topic Modeling
Activity 5.01: Topic-Modeling Jeopardy Questions
Activity 5.02: Comparing Different Topic Models
6. Vector Representation
Activity 6.01: Finding Similar News Article Using Document Vectors
7. Text Generation and Summarization
Activity 7.01: Summarizing Complaints in the Consumer Financial Protection Bureau Dataset
8. Sentiment Analysis
Activity 8.01: Tweet Sentiment Analysis Using the textblob library
Activity 8.02: Training a Sentiment Model Using TFIDF and Logistic Regression
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
The Natural Language Processing Workshop
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset