0%

Kickstart your NLP journey by exploring BERT and its variants such as ALBERT, RoBERTa, DistilBERT, VideoBERT, and more with Hugging Face's transformers library

Key Features

  • Explore the encoder and decoder of the transformer model
  • Become well-versed with BERT along with ALBERT, RoBERTa, and DistilBERT
  • Discover how to pre-train and fine-tune BERT models for several NLP tasks

Book Description

BERT (bidirectional encoder representations from transformer) has revolutionized the world of natural language processing (NLP) with promising results. This book is an introductory guide that will help you get to grips with Google's BERT architecture. With a detailed explanation of the transformer architecture, this book will help you understand how the transformer's encoder and decoder work.

You'll explore the BERT architecture by learning how the BERT model is pre-trained and how to use pre-trained BERT for downstream tasks by fine-tuning it for NLP tasks such as sentiment analysis and text summarization with the Hugging Face transformers library. As you advance, you'll learn about different variants of BERT such as ALBERT, RoBERTa, and ELECTRA, and look at SpanBERT, which is used for NLP tasks like question answering. You'll also cover simpler and faster BERT variants based on knowledge distillation such as DistilBERT and TinyBERT. The book takes you through MBERT, XLM, and XLM-R in detail and then introduces you to sentence-BERT, which is used for obtaining sentence representation. Finally, you'll discover domain-specific BERT models such as BioBERT and ClinicalBERT, and discover an interesting variant called VideoBERT.

By the end of this BERT book, you'll be well-versed with using BERT and its variants for performing practical NLP tasks.

What you will learn

  • Understand the transformer model from the ground up
  • Find out how BERT works and pre-train it using masked language model (MLM) and next sentence prediction (NSP) tasks
  • Get hands-on with BERT by learning to generate contextual word and sentence embeddings
  • Fine-tune BERT for downstream tasks
  • Get to grips with ALBERT, RoBERTa, ELECTRA, and SpanBERT models
  • Get the hang of the BERT models based on knowledge distillation
  • Understand cross-lingual models such as XLM and XLM-R
  • Explore Sentence-BERT, VideoBERT, and BART

Who this book is for

This book is for NLP professionals and data scientists looking to simplify NLP tasks to enable efficient language understanding using BERT. A basic understanding of NLP concepts and deep learning is required to get the best out of this book.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Getting Started with Google BERT
  3. Dedication
  4. About Packt
    1. Why subscribe?
  5. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  7. Section 1 - Starting Off with BERT
  8. A Primer on Transformers
    1. Introduction to the transformer 
    2. Understanding the encoder of the transformer 
    3. Self-attention mechanism 
    4. Understanding the self-attention mechanism 
    5. Step 1
    6. Step 2
    7. Step 3
    8. Step 4
    9. Multi-head attention mechanism 
    10. Learning position with positional encoding 
    11. Feedforward network
    12. Add and norm component 
    13. Putting all the encoder components together 
    14. Understanding the decoder of a transformer
    15. Masked multi-head attention 
    16. Multi-head attention 
    17. Feedforward network
    18. Add and norm component 
    19. Linear and softmax layers
    20. Putting all the decoder components together 
    21. Putting the encoder and decoder together 
    22. Training the transformer
    23. Summary
    24. Questions
    25. Further reading
  9. Understanding the BERT Model
    1. Basic idea of BERT 
    2. Working of BERT 
    3. Configurations of BERT 
    4. BERT-base
    5. BERT-large
    6. Other configurations of BERT
    7. Pre-training the BERT model
    8. Input data representation
    9. Token embedding 
    10. Segment embedding
    11. Position embedding 
    12. Final representation 
    13. WordPiece tokenizer 
    14. Pre-training strategies 
    15. Language modeling
    16. Auto-regressive language modeling 
    17. Auto-encoding language modeling
    18. Masked language modeling
    19. Whole word masking 
    20. Next sentence prediction 
    21. Pre-training procedure 
    22. Subword tokenization algorithms 
    23. Byte pair encoding 
    24. Tokenizing with BPE 
    25. Byte-level byte pair encoding 
    26. WordPiece
    27. Summary
    28. Questions
    29. Further reading
  10. Getting Hands-On with BERT
    1. Exploring the pre-trained BERT model
    2. Extracting embeddings from pre-trained BERT 
    3. Hugging Face transformers 
    4. Generating BERT embeddings
    5. Preprocessing the input 
    6. Getting the embedding 
    7. Extracting embeddings from all encoder layers of BERT
    8. Extracting the embeddings 
    9. Preprocessing the input
    10. Getting the embeddings 
    11. Fine-tuning BERT for downstream tasks
    12. Text classification 
    13. Fine-tuning BERT for sentiment analysis 
    14. Importing the dependencies 
    15. Loading the model and dataset
    16. Preprocessing the dataset
    17. Training the model 
    18. Natural language inference 
    19. Question-answering
    20. Performing question-answering with fine-tuned BERT 
    21. Preprocessing the input
    22. Getting the answer
    23. Named entity recognition 
    24. Summary 
    25. Questions
    26. Further reading 
  11. Section 2 - Exploring BERT Variants
  12. BERT Variants I - ALBERT, RoBERTa, ELECTRA, and SpanBERT
    1. A Lite version of BERT 
    2. Cross-layer parameter sharing 
    3. Factorized embedding parameterization
    4. Training the ALBERT model
    5. Sentence order prediction
    6. Comparing ALBERT with BERT 
    7. Extracting embeddings with ALBERT
    8. Robustly Optimized BERT pre-training Approach
    9. Using dynamic masking instead of static masking 
    10. Removing the NSP task
    11. Training with more data points
    12. Training with a large batch size 
    13. Using BBPE as a tokenizer 
    14. Exploring the RoBERTa tokenizer 
    15. Understanding ELECTRA 
    16. Understanding the replaced token detection task 
    17. Exploring the generator and discriminator of ELECTRA 
    18. Training the ELECTRA model
    19. Exploring efficient training methods
    20. Predicting span with SpanBERT
    21. Understanding the architecture of SpanBERT
    22. Exploring SpanBERT 
    23. Performing Q and As with pre-trained SpanBERT 
    24. Summary
    25. Questions
    26. Further reading 
  13. BERT Variants II - Based on Knowledge Distillation
    1. Introducing knowledge distillation 
    2. Training the student network 
    3. DistilBERT – the distilled version of BERT 
    4. Teacher-student architecture 
    5. The teacher BERT
    6. The student BERT
    7. Training the student BERT (DistilBERT) 
    8. Introducing TinyBERT 
    9. Teacher-student architecture  
    10. Understanding the teacher BERT  
    11. Understanding the student BERT 
    12. Distillation in TinyBERT 
    13. Transformer layer distillation 
    14. Attention-based distillation
    15. Hidden state-based distillation 
    16. Embedding layer distillation 
    17. Prediction layer distillation
    18. The final loss function 
    19. Training the student BERT (TinyBERT)
    20. General distillation 
    21. Task-specific distillation 
    22. The data augmentation method 
    23. Transferring knowledge from BERT to neural networks
    24. Teacher-student architecture 
    25. The teacher BERT 
    26. The student network 
    27. Training the student network  
    28. The data augmentation method
    29. Understanding the masking method
    30. Understanding the POS-guided word replacement method 
    31. Understanding the n-gram sampling method
    32. The data augmentation procedure
    33. Summary
    34. Questions
    35. Further reading 
  14. Section 3 - Applications of BERT
  15. Exploring BERTSUM for Text Summarization
    1. Text summarization 
    2. Extractive summarization
    3. Abstractive summarization 
    4. Fine-tuning BERT for text summarization 
    5. Extractive summarization using BERT 
    6. BERTSUM with a classifier 
    7. BERTSUM with a transformer and LSTM 
    8. BERTSUM with an inter-sentence transformer 
    9. BERTSUM with LSTM 
    10. Abstractive summarization using BERT 
    11. Understanding ROUGE evaluation metrics
    12. Understanding the ROUGE-N metric 
    13. ROUGE-1 
    14. ROUGE-2 
    15. Understanding ROUGE-L  
    16. The performance of the BERTSUM model 
    17. Training the BERTSUM model 
    18. Summary 
    19. Questions
    20. Further reading
  16. Applying BERT to Other Languages
    1. Understanding multilingual BERT 
    2. Evaluating M-BERT on the NLI task 
    3. Zero-shot 
    4. TRANSLATE-TEST 
    5. TRANSLATE-TRAIN
    6. TRANSLATE-TRAIN-ALL
    7. How multilingual is multilingual BERT? 
    8. Effect of vocabulary overlap
    9. Generalization across scripts 
    10. Generalization across typological features 
    11. Effect of language similarity
    12. Effect of code switching and transliteration
    13. Code switching 
    14. Transliteration 
    15. M-BERT on code switching and transliteration 
    16. The cross-lingual language model
    17. Pre-training strategies 
    18. Causal language modeling 
    19. Masked language modeling 
    20. Translation language modeling 
    21. Pre-training the XLM model
    22. Evaluation of XLM
    23. Understanding XLM-R
    24. Language-specific BERT 
    25. FlauBERT for French 
    26. Getting a representation of a French sentence with FlauBERT 
    27. French Language Understanding Evaluation
    28. BETO for Spanish 
    29. Predicting masked words using BETO 
    30. BERTje for Dutch
    31. Next sentence prediction with BERTje
    32. German BERT 
    33. Chinese BERT 
    34. Japanese BERT 
    35. FinBERT for Finnish
    36. UmBERTo for Italian 
    37. BERTimbau for Portuguese 
    38. RuBERT for Russian 
    39. Summary
    40. Questions
    41. Further reading
  17. Exploring Sentence and Domain-Specific BERT
    1. Learning about sentence representation with Sentence-BERT  
    2. Computing sentence representation 
    3. Understanding Sentence-BERT 
    4. Sentence-BERT with a Siamese network 
    5. Sentence-BERT for a sentence pair classification task
    6. Sentence-BERT for a sentence pair regression task
    7. Sentence-BERT with a triplet network
    8. Exploring the sentence-transformers library 
    9. Computing sentence representation using Sentence-BERT 
    10. Computing sentence similarity 
    11. Loading custom models
    12. Finding a similar sentence with Sentence-BERT 
    13. Learning multilingual embeddings through knowledge distillation 
    14. Teacher-student architecture
    15. Using the multilingual model 
    16. Domain-specific BERT 
    17. ClinicalBERT 
    18. Pre-training ClinicalBERT 
    19. Fine-tuning ClinicalBERT 
    20. Extracting clinical word similarity 
    21. BioBERT 
    22. Pre-training the BioBERT model
    23. Fine-tuning the BioBERT model 
    24. BioBERT for NER tasks 
    25. BioBERT for question answering 
    26. Summary 
    27. Questions
    28. Further reading
  18. Working with VideoBERT, BART, and More
    1. Learning language and video representations with VideoBERT 
    2. Pre-training a VideoBERT model  
    3. Cloze task 
    4. Linguistic-visual alignment  
    5. The final pre-training objective 
    6. Data source and preprocessing 
    7. Applications of VideoBERT 
    8. Predicting the next visual tokens
    9. Text-to-video generation 
    10. Video captioning 
    11. Understanding BART 
    12. Architecture of BART 
    13. Noising techniques 
    14. Token masking
    15. Token deletion
    16. Token infilling 
    17. Sentence shuffling 
    18. Document rotation
    19. Comparing different pre-training objectives 
    20. Performing text summarization with BART 
    21. Exploring BERT libraries 
    22. Understanding ktrain
    23. Sentiment analysis using ktrain
    24. Building a document answering model 
    25. Document summarization 
    26. bert-as-service 
    27. Installing the library 
    28. Computing sentence representation
    29. Computing contextual word representation 
    30. Summary 
    31. Questions 
    32. Further reading
  19. Assessments
    1. Chapter 1, A Primer on Transformers
    2. Chapter 2, Understanding the BERT Model
    3. Chapter 3, Getting Hands-On with BERT
    4. Chapter 4, BERT Variants I – ALBERT, RoBERTa, ELECTRA, SpanBERT
    5. Chapter 5, BERT Variants II – Based on Knowledge Distillation
    6. Chapter 6, Exploring BERTSUM for Text Summarization
    7. Chapter 7, Applying BERT to Other Languages
    8. Chapter 8, Exploring Sentence- and Domain-Specific BERT
    9. Chapter 9, Working with VideoBERT, BART, and More
  20. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
34.207.137.245