0%

Take your NLP knowledge to the next level and become an AI language understanding expert by mastering the quantum leap of Transformer neural network models

Key Features

  • Build and implement state-of-the-art language models, such as the original Transformer, BERT, T5, and GPT-2, using concepts that outperform classical deep learning models
  • Go through hands-on applications in Python using Google Colaboratory Notebooks with nothing to install on a local machine
  • Test transformer models on advanced use cases

Book Description

The transformer architecture has proved to be revolutionary in outperforming the classical RNN and CNN models in use today. With an apply-as-you-learn approach, Transformers for Natural Language Processing investigates in vast detail the deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains with transformers.

The book takes you through NLP with Python and examines various eminent models and datasets within the transformer architecture created by pioneers such as Google, Facebook, Microsoft, OpenAI, and Hugging Face.

The book trains you in three stages. The first stage introduces you to transformer architectures, starting with the original transformer, before moving on to RoBERTa, BERT, and DistilBERT models. You will discover training methods for smaller transformers that can outperform GPT-3 in some cases. In the second stage, you will apply transformers for Natural Language Understanding (NLU) and Natural Language Generation (NLG). Finally, the third stage will help you grasp advanced language understanding techniques such as optimizing social network datasets and fake news identification.

By the end of this NLP book, you will understand transformers from a cognitive science perspective and be proficient in applying pretrained transformer models by tech giants to various datasets.

What you will learn

  • Use the latest pretrained transformer models
  • Grasp the workings of the original Transformer, GPT-2, BERT, T5, and other transformer models
  • Create language understanding Python programs using concepts that outperform classical deep learning models
  • Use a variety of NLP platforms, including Hugging Face, Trax, and AllenNLP
  • Apply Python, TensorFlow, and Keras programs to sentiment analysis, text summarization, speech recognition, machine translations, and more
  • Measure the productivity of key transformers to define their scope, potential, and limits in production

Who this book is for

Since the book does not teach basic programming, you must be familiar with neural networks, Python, PyTorch, and TensorFlow in order to learn their implementation with Transformers.

Readers who can benefit the most from this book include experienced deep learning & NLP practitioners and data analysts & data scientists who want to process the increasing amounts of language-driven data.

Table of Contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Getting Started with the Model Architecture of the Transformer
    1. The background of the Transformer
    2. The rise of the Transformer: Attention Is All You Need
    3. The encoder stack
    4. Input embedding
    5. Positional encoding
    6. Sub-layer 1: Multi-head attention
    7. Sub-layer 2: Feedforward network
    8. The decoder stack
    9. Output embedding and position encoding
    10. The attention layers
    11. The FFN sub-layer, the Post-LN, and the linear layer
    12. Training and performance
    13. Before we end the chapter
    14. Summary
    15. Questions
    16. References
  3. Fine-Tuning BERT Models
    1. The architecture of BERT
    2. The encoder stack
    3. Preparing the pretraining input environment
    4. Pretraining and fine-tuning a BERT model
    5. Fine-tuning BERT
    6. Activating the GPU
    7. Installing the Hugging Face PyTorch interface for BERT
    8. Importing the modules
    9. Specifying CUDA as the device for torch
    10. Loading the dataset
    11. Creating sentences, label lists, and adding BERT tokens
    12. Activating the BERT tokenizer
    13. Processing the data
    14. Creating attention masks
    15. Splitting data into training and validation sets
    16. Converting all the data into torch tensors
    17. Selecting a batch size and creating an iterator
    18. BERT model configuration
    19. Loading the Hugging Face BERT uncased base model
    20. Optimizer grouped parameters
    21. The hyperparameters for the training loop
    22. The training loop
    23. Training evaluation
    24. Predicting and evaluating using the holdout dataset
    25. Evaluating using Matthews Correlation Coefficient
    26. The score of individual batches
    27. Matthews evaluation for the whole dataset
    28. Summary
    29. Questions
    30. References
  4. Pretraining a RoBERTa Model from Scratch
    1. Training a tokenizer and pretraining a transformer
    2. Building KantaiBERT from scratch
    3. Step 1: Loading the dataset
    4. Step 2: Installing Hugging Face transformers
    5. Step 3: Training a tokenizer
    6. Step 4: Saving the files to disk
    7. Step 5: Loading the trained tokenizer files
    8. Step 6: Checking resource constraints: GPU and CUDA
    9. Step 7: Defining the configuration of the model
    10. Step 8: Reloading the tokenizer in transformers
    11. Step 9: Initializing a model from scratch
    12. Exploring the parameters
    13. Step 10: Building the dataset
    14. Step 11: Defining a data collator
    15. Step 12: Initializing the trainer
    16. Step 13: Pretraining the model
    17. Step 14: Saving the final model (+tokenizer + config) to disk
    18. Step 15: Language modeling with FillMaskPipeline
    19. Next steps
    20. Summary
    21. Questions
    22. References
  5. Downstream NLP Tasks with Transformers
    1. Transduction and the inductive inheritance of transformers
    2. The human intelligence stack
    3. The machine intelligence stack
    4. Transformer performances versus Human Baselines
    5. Evaluating models with metrics
    6. Accuracy score
    7. F1-score
    8. Matthews Correlation Coefficient (MCC)
    9. Benchmark tasks and datasets
    10. From GLUE to SuperGLUE
    11. Introducing higher Human Baseline standards
    12. The SuperGLUE evaluation process
    13. Defining the SuperGLUE benchmark tasks
    14. BoolQ
    15. Commitment Bank (CB)
    16. Multi-Sentence Reading Comprehension (MultiRC)
    17. Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
    18. Recognizing Textual Entailment (RTE)
    19. Words in Context (WiC)
    20. The Winograd Schema Challenge (WSC)
    21. Running downstream tasks
    22. The Corpus of Linguistic Acceptability (CoLA)
    23. Stanford Sentiment TreeBank (SST-2)
    24. Microsoft Research Paraphrase Corpus (MRPC)
    25. Winograd schemas
    26. Summary
    27. Questions
    28. References
  6. Machine Translation with the Transformer
    1. Defining machine translation
    2. Human transductions and translations
    3. Machine transductions and translations
    4. Preprocessing a WMT dataset
    5. Preprocessing the raw data
    6. Finalizing the preprocessing of the datasets
    7. Evaluating machine translation with BLEU
    8. Geometric evaluations
    9. Applying a smoothing technique
    10. Chencherry smoothing
    11. Translations with Trax
    12. Installing Trax
    13. Creating a Transformer model
    14. Initializing the model using pretrained weights
    15. Tokenizing a sentence
    16. Decoding from the Transformer
    17. De-tokenizing and displaying the translation
    18. Summary
    19. Questions
    20. References
  7. Text Generation with OpenAI GPT-2 and GPT-3 Models
    1. The rise of billion-parameter transformer models
    2. The increasing size of transformer models
    3. Context size and maximum path length
    4. Transformers, reformers, PET, or GPT?
    5. The limits of the original Transformer architecture
    6. Running BertViz
    7. The Reformer
    8. Pattern-Exploiting Training (PET)
    9. The philosophy of Pattern-Exploiting Training (PET)
    10. It's time to make a decision
    11. The architecture of OpenAI GPT models
    12. From fine-tuning to zero-shot models
    13. Stacking decoder layers
    14. Text completion with GPT-2
    15. Step 1: Activating the GPU
    16. Step 2: Cloning the OpenAI GPT-2 repository
    17. Step 3: Installing the requirements
    18. Step 4: Checking the version of TensorFlow
    19. Step 5: Downloading the 345M parameter GPT-2 model
    20. Steps 6-7: Intermediate instructions
    21. Steps 7b-8: Importing and defining the model
    22. Step 9: Interacting with GPT-2
    23. Training a GPT-2 language model
    24. Step 1: Prerequisites
    25. Steps 2 to 6: Initial steps of the training process
    26. Step 7: The N Shepperd training files
    27. Step 8: Encoding the dataset
    28. Step 9: Training the model
    29. Step 10: Creating a training model directory
    30. Context and completion examples
    31. Generating music with transformers
    32. Summary
    33. Questions
    34. References
  8. Applying Transformers to Legal and Financial Documents for AI Text Summarization
    1. Designing a universal text-to-text model
    2. The rise of text-to-text transformer models
    3. A prefix instead of task-specific formats
    4. The T5 model
    5. Text summarization with T5
    6. Hugging Face
    7. Hugging Face transformer resources
    8. Initializing the T5-large transformer model
    9. Getting started with T5
    10. Exploring the architecture of the T5 model
    11. Summarizing documents with T5-large
    12. Creating a summarization function
    13. A general topic sample
    14. The Bill of Rights sample
    15. A corporate law sample
    16. Summary
    17. Questions
    18. References
  9. Matching Tokenizers and Datasets
    1. Matching datasets and tokenizers
    2. Best practices
    3. Step 1: Preprocessing
    4. Step 2: Post-processing
    5. Continuous human quality control
    6. Word2Vec tokenization
    7. Case 0: Words in the dataset and the dictionary
    8. Case 1: Words not in the dataset or the dictionary
    9. Case 2: Noisy relationships
    10. Case 3: Rare words
    11. Case 4: Replacing rare words
    12. Case 5: Entailment
    13. Standard NLP tasks with specific vocabulary
    14. Generating unconditional samples with GPT-2
    15. Controlling tokenized data
    16. Generating trained conditional samples
    17. T5 Bill of Rights Sample
    18. Summarizing the Bill of Rights, version 1
    19. Summarizing the Bill of Rights, version 2
    20. Summary
    21. Questions
    22. References
  10. Semantic Role Labeling with BERT-Based Transformers
    1. Getting started with SRL
    2. Defining Semantic Role Labeling
    3. Visualizing SRL
    4. Running a pretrained BERT-based model
    5. The architecture of the BERT-based model
    6. Setting up the BERT SRL environment
    7. SRL experiments with the BERT-based model
    8. Basic samples
    9. Sample 1
    10. Sample 2
    11. Sample 3
    12. Difficult samples
    13. Sample 4
    14. Sample 5
    15. Sample 6
    16. Summary
    17. Questions
    18. References
  11. Let Your Data Do the Talking: Story, Questions, and Answers
    1. Methodology
    2. Transformers and methods
    3. Method 0: Trial and error
    4. Method 1: NER first
    5. Using NER to find questions
    6. Location entity questions
    7. Person entity questions
    8. Method 2: SRL first
    9. Question-answering with ELECTRA
    10. Project management constraints
    11. Using SRL to find questions
    12. Next steps
    13. Exploring Haystack with a RoBERTa model
    14. Summary
    15. Questions
    16. References
  12. Detecting Customer Emotions to Make Predictions
    1. Getting started: Sentiment analysis transformers
    2. The Stanford Sentiment Treebank (SST)
    3. Sentiment analysis with RoBERTa-large
    4. Predicting customer behavior with sentiment analysis
    5. Sentiment analysis with DistilBERT
    6. Sentiment analysis with Hugging Face's models list
    7. DistilBERT for SST
    8. MiniLM-L12-H384-uncased
    9. RoBERTa-large-mnli
    10. BERT-base multilingual model
    11. Summary
    12. Questions
    13. References
  13. Analyzing Fake News with Transformers
    1. Emotional reactions to fake news
    2. Cognitive dissonance triggers emotional reactions
    3. Analyzing a conflictual Tweet
    4. Behavioral representation of fake news
    5. A rational approach to fake news
    6. Defining a fake news resolution roadmap
    7. Gun control
    8. Sentiment analysis
    9. Named entity recognition (NER)
    10. Semantic Role Labeling (SRL)
    11. Reference sites
    12. COVID-19 and former President Trump's Tweets
    13. Semantic Role Labeling (SRL)
    14. Before we go
    15. Looking for the silver bullet
    16. Looking for reliable training methods
    17. Summary
    18. Questions
    19. References
  14. Appendix: Answers to the Questions
    1. Chapter 1, Getting Started with the Model Architecture of the Transformer
    2. Chapter 2, Fine-Tuning BERT Models
    3. Chapter 3, Pretraining a RoBERTa Model from Scratch
    4. Chapter 4, Downstream NLP Tasks with Transformers
    5. Chapter 5, Machine Translation with the Transformer
    6. Chapter 6, Text Generation with OpenAI GPT-2 and GPT-3 Models
    7. Chapter 7, Applying Transformers to Legal and Financial Documents for AI Text Summarization
    8. Chapter 8, Matching Tokenizers and Datasets
    9. Chapter 9, Semantic Role Labeling with BERT-Based Transformers
    10. Chapter 10, Let Your Data Do the Talking: Story, Questions, and Answers
    11. Chapter 11, Detecting Customer Emotions to Make Predictions
    12. Chapter 12, Analyzing Fake News with Transformers
  15. Other Books You May Enjoy
  16. Index
18.119.118.99