0%

Build end-to-end industrial-strength NLP models using advanced morphological and syntactic features in spaCy to create real-world applications with ease

Key Features

  • Gain an overview of what spaCy offers for natural language processing
  • Learn details of spaCy's features and how to use them effectively
  • Work through practical recipes using spaCy

Book Description

spaCy is an industrial-grade, efficient NLP Python library. It offers various pre-trained models and ready-to-use features. Mastering spaCy provides you with end-to-end coverage of spaCy's features and real-world applications.

You'll begin by installing spaCy and downloading models, before progressing to spaCy's features and prototyping real-world NLP apps. Next, you'll get familiar with visualizing with spaCy's popular visualizer displaCy. The book also equips you with practical illustrations for pattern matching and helps you advance into the world of semantics with word vectors. Statistical information extraction methods are also explained in detail. Later, you'll cover an interactive business case study that shows you how to combine all spaCy features for creating a real-world NLP pipeline. You'll implement ML models such as sentiment analysis, intent recognition, and context resolution. The book further focuses on classification with popular frameworks such as TensorFlow's Keras API together with spaCy. You'll cover popular topics, including intent classification and sentiment analysis, and use them on popular datasets and interpret the classification results.

By the end of this book, you'll be able to confidently use spaCy, including its linguistic features, word vectors, and classifiers, to create your own NLP apps.

What you will learn

  • Install spaCy, get started easily, and write your first Python script
  • Understand core linguistic operations of spaCy
  • Discover how to combine rule-based components with spaCy statistical models
  • Become well-versed with named entity and keyword extraction
  • Build your own ML pipelines using spaCy
  • Apply all the knowledge you've gained to design a chatbot using spaCy

Who this book is for

This book is for data scientists and machine learners who want to excel in NLP as well as NLP developers who want to master spaCy and build applications with it. Language and speech professionals who want to get hands-on with Python and spaCy and software developers who want to quickly prototype applications with spaCy will also find this book helpful. Beginner-level knowledge of the Python programming language is required to get the most out of this book. A beginner-level understanding of linguistics such as parsing, POS tags, and semantic similarity will also be useful.

Table of Contents

  1. Mastering spaCy
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  6. Section 1: Getting Started with spaCy
  7. Chapter 1: Getting Started with spaCy
    1. Technical requirements
    2. Overview of spaCy
    3. Rise of NLP
    4. NLP with Python
    5. Reviewing some useful string operations
    6. Getting a high-level overview of the spaCy library
    7. Tips for the reader
    8. Installing spaCy
    9. Installing spaCy with pip
    10. Installing spaCy with conda
    11. Installing spaCy on macOS/OS X
    12. Installing spaCy on Windows
    13. Troubleshooting while installing spaCy
    14. Installing spaCy's statistical models
    15. Installing language models
    16. Visualization with displaCy
    17. Getting started with displaCy
    18. Entity visualizer
    19. Visualizing within Python
    20. Using displaCy in Jupyter notebooks
    21. Exporting displaCy graphics as an image file
    22. Summary
  8. Chapter 2: Core Operations with spaCy
    1. Technical requirements
    2. Overview of spaCy conventions
    3. Introducing tokenization
    4. Customizing the tokenizer
    5. Debugging the tokenizer
    6. Sentence segmentation
    7. Understanding lemmatization
    8. Lemmatization in NLU
    9. Understanding the difference between lemmatization and stemming
    10. spaCy container objects
    11. Doc
    12. Token
    13. Span
    14. More spaCy features
    15. Summary
  9. Section 2: spaCy Features
  10. Chapter 3: Linguistic Features
    1. Technical requirements
    2. What is POS tagging?
    3. WSD
    4. Verb tense and aspect in NLU applications
    5. Understanding number, symbol, and punctuation tags
    6. Introduction to dependency parsing
    7. What is dependency parsing?
    8. Dependency relations
    9. Syntactic relations
    10. Introducing NER
    11. A real-world example
    12. Merging and splitting tokens
    13. Summary
  11. Chapter 4: Rule-Based Matching
    1. Token-based matching
    2. Extended syntax support
    3. Regex-like operators
    4. Regex support
    5. Matcher online demo
    6. PhraseMatcher
    7. EntityRuler
    8. Combining spaCy models and matchers
    9. Extracting IBAN and account numbers
    10. Extracting phone numbers
    11. Extracting mentions
    12. Hashtag and emoji extraction
    13. Expanding named entities
    14. Combining linguistic features and named entities
    15. Summary
  12. Chapter 5: Working with Word Vectors and Semantic Similarity
    1. Technical requirements
    2. Understanding word vectors
    3. One-hot encoding
    4. Word vectors
    5. Analogies and vector operations
    6. How word vectors are produced
    7. Using spaCy's pretrained vectors
    8. The similarity method
    9. Using third-party word vectors
    10. Advanced semantic similarity methods
    11. Understanding semantic similarity
    12. Categorizing text with semantic similarity
    13. Extracting key phrases
    14. Extracting and comparing named entities
    15. Summary
  13. Chapter 6: Putting Everything Together: Semantic Parsing with spaCy
    1. Technical requirements
    2. Extracting named entities
    3. Getting to know the ATIS dataset
    4. Extracting named entities with Matcher
    5. Using dependency trees for extracting entities
    6. Using dependency relations for intent recognition
    7. Linguistic primer
    8. Extracting transitive verbs and their direct objects
    9. Extracting multiple intents with conjunction relation
    10. Recognizing the intent using wordlists
    11. Semantic similarity methods for semantic parsing
    12. Using synonyms lists for semantic similarity
    13. Using word vectors to recognize semantic similarity
    14. Putting it all together
    15. Summary
  14. Section 3: Machine Learning with spaCy
  15. Chapter 7: Customizing spaCy Models
    1. Technical requirements
    2. Getting started with data preparation
    3. Do spaCy models perform well enough on your data?
    4. Does your domain include many labels that are absent in spaCy models?
    5. Annotating and preparing data
    6. Annotating data with Prodigy
    7. Annotating data with Brat
    8. spaCy training data format
    9. Updating an existing pipeline component
    10. Disabling the other statistical models
    11. Model training procedure
    12. Evaluating the updated NER
    13. Saving and loading custom models
    14. Training a pipeline component from scratch
    15. Working with a real-world dataset
    16. Summary
  16. Chapter 8: Text Classification with spaCy
    1. Technical requirements
    2. Understanding the basics of text classification
    3. Training the spaCy text classifier
    4. Getting to know TextCategorizer class
    5. Formatting training data for the TextCategorizer
    6. Defining the training loop
    7. Testing the new component
    8. Training TextCategorizer for multilabel classification
    9. Sentiment analysis with spaCy
    10. Exploring the dataset
    11. Training the TextClassifier component
    12. Text classification with spaCy and Keras
    13. What is a layer?
    14. Sequential modeling with LSTMs
    15. Keras Tokenizer
    16. Embedding words
    17. Neural network architecture for text classification
    18. Summary
    19. References
  17. Chapter 9: spaCy and Transformers
    1. Technical requirements
    2. Transformers and transfer learning
    3. Understanding BERT
    4. BERT architecture
    5. BERT input format
    6. How is BERT trained?
    7. Transformers and TensorFlow
    8. HuggingFace Transformers
    9. Using the BERT tokenizer
    10. Obtaining BERT word vectors
    11. Using BERT for text classification
    12. Using Transformer pipelines
    13. Transformers and spaCy
    14. Summary
  18. Chapter 10: Putting Everything Together: Designing Your Chatbot with spaCy
    1. Technical requirements
    2. Introduction to conversational AI
    3. NLP components of conversational AI products
    4. Getting to know the dataset
    5. Entity extraction
    6. Extracting city entities
    7. Extracting date and time entities
    8. Extracting phone numbers
    9. Extracting cuisine types
    10. Intent recognition
    11. Pattern-based text classification
    12. Classifying text with a character-level LSTM
    13. Differentiating subjects from objects
    14. Parsing the sentence type
    15. Anaphora resolution
    16. Summary
    17. References
    18. Why subscribe?
  19. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Leave a review - let other readers know what you think
18.219.22.169