This book provides a blend of both the theoretical and practical aspects of Natural Language Processing (NLP). It covers the concepts essential to develop a thorough understanding of NLP and also delves into a detailed discussion on NLP-based use cases such as language translation, sentiment analysis, chatbots, and many more. The book also goes into the details of the application of machine learning and deep learning in improving the efficiency of NLP applications and introduces readers to the recent developments in this field. Every module covers real-world examples that can be replicated and built upon.

Who this book is for

This book is for anyone interested in NLP who is seeking to learn about its theoretical and practical aspects alike. The book starts from the basics and gradually progresses to more advanced concepts, making it suitable for an audience with varying levels of prior NLP proficiency, and for those who want to develop a thorough understanding of NLP methodologies to build linguistic applications. However, a working knowledge of the Python programming language and high-school-level mathematics is expected.

What this book covers

Chapter 1, Understanding the Basics of NLP, will introduce you to the past, present, and future of NLP research and applications.

Chapter 2, NLP Using Python, will gently introduce you to the Python libraries that are used frequently in NLP and that we will use later in the book.

Chapter 3, Building Your NLP Vocabulary, will introduce you to methodologies for natural language data cleaning and vocabulary building.

Chapter 4, Transforming Text into Data Structures, will discuss basic syntactical techniques for representing text using numbers and building a chatbot.

Chapter 5, Word Embeddings and Distance Measurements for Text, will introduce you to word-level semantic embedding creation and establishing the similarity between documents.

Chapter 6, Exploring Sentence-, Document-, and Character-Level Embeddings, will dive deeper into techniques for embedding creation at character, sentence, and document level, along with building a spellchecker.

Chapter 7, Identifying Patterns in Text Using Machine Learning, will use machine learning algorithms to build a sentiment analyzer.

Chapter 8, From Human Neurons to Artificial Neurons for Understanding Text, will introduce you to the concepts of deep learning and how they are used for NLP tasks such as question classification.

Chapter 9, Applying Convolutions to Text, will discuss how convolutions can be used to extract patterns in text data for solving NLP problems such as sarcasm detection.

Chapter 10, Capturing Temporal Relationships in Text, will explain how to extract sequential relationships prevalent in text data and build a text generator using them.

Chapter 11, State of the Art in NLP, will discuss recent concepts, including Seq2Seq modeling, attention, transformers, BERT, and will also see us building a language translator.

To get the most out of this book

You will need Python 3 installed on your system. You can use any IDE to practice the code samples provided in the book, but since the code samples are provided as Jupyter notebooks, we recommend installing the Jupyter IDE. All code examples have been tested on the Windows OS. However, the programs are platform agnostic and should work with other 32/64-bit OSes as well. Other system requirements include RAM of 4 GB or higher, and at least 6 GB of free disk space.

We recommend installing the Python libraries discussed in this book using pip or conda. The code snippets in the book mention the relevant command to install a given library on the Windows OS. Please refer to the source page of the library for installation instructions for other OSes.

Software/hardware covered in the book	OS requirements
pandas	Windows 7 or later, macOS, Linux
NumPy	Windows 7 or later, macOS, Linux
Jupyter	Windows 7 or later, macOS, Linux
beautifulsoup4	Windows 7 or later, macOS, Linux
scikit-learn	Windows 7 or later, macOS, Linux
Keras	Windows 7 or later, macOS, Linux
NLTK	Windows 7 or later, macOS, Linux

The last project covered in this book requires a higher-spec machine. However, you can run the program on the Google Colab GPU machine if needs be.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the Support tab.
Click on Code Downloads.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Hands-On-Python-Natural-Language-Processing. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here:https://static.packt-cdn.com/downloads/9781838989590_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText:Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.Here is an example:"We will be performing preprocessing on the Tips dataset, which comes with the seaborn Python package."

A block of code is set as follows:

import pandas as pd
data = pd.read_csv("amazon_cells_labelled.txt", sep='	', header=None)

X = data.iloc[:,0] # extract column with review
y = data.iloc[:,-1] # extract column with sentiment

# tokenize the news text and convert data in matrix format
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
X_vec = vectorizer.fit_transform(X)
X_vec = X_vec.todense() # convert sparse matrix into dense matrix

# Transform data by applying term frequency inverse document frequency (TFIDF)
from sklearn.feature_extraction.text import TfidfTransformer
tfidf = TfidfTransformer()
X_tfidf = tfidf.fit_transform(X_vec)
X_tfidf = X_tfidf.todense()

Any command-line input or output is written as follows:

          pip install requests
          

          pip install beautifulsoup4

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "This is called cross-validation and is an important part of ML model training."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book,mention the book title in the subject of your message and email us at[email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visitwww.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at[email protected]with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visitauthors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Table of Contents for Preface

Create new playlist

Sign In

Sign Up