Index
Symbols
VisualEncoder
Transformer model, training with 264
A
abstractive summaries
Adaptive Moment Estimation (Adam Optimizer) 119
Attention mechanism 123
Audio-Visual Speech Recognition (AVSR) 228
B
Bahdanau Attention 126
Bahdanau attention layer 197, 198, 199
Batch Normalization (BatchNorm) 245
used, for decoding penalties 218, 219, 220
used for improving text summarization 214, 216, 217
BERT-based transfer learning 123
encoder-decoder networks 123, 124
BERT fine-tuning approach
for SQuAD question answering 341, 342
bidirectional encoder representations from transformers (BERT) model 132, 133
about 131
custom layers, building 142, 143, 144, 145, 146, 147
normalization 133, 134, 135, 136, 137, 138, 139
sequences 135
tokenization 133, 134, 135, 136, 137, 138, 139
Bi-directional Long Short-Term Memory (BiLSTM) 25
Bi-Directional LSTMs (BiLSTMs) 47
Bilingual Evaluation Understudy (BLEU) 221, 280
BiLSTM baseline model 295
training, on weakly supervised data from Snorkel 322, 323, 324
used, for training 297, 298, 299, 300
BiLSTM model 65, 66, 67, 68, 69
bottleneck design 244
Byte Pair Encoding (BPE) 117, 132
Byte Pair Encoding (BPE) 26
C
captions
generating 274, 275, 276, 277, 278, 279, 280
cloud-based solutions, for building task-x task-oriented conversational agents
intent identification 339
slot tagging 339
Common Objects in Context (COCO)
URL 235
conda environment
setting up 346
Conditional Random Fields (CRFs)
Consensus-Based Image Description Evaluation (CIDEr) 280
constructor
context-free vectorization 36
Continuous Bag-of-Words 41
Continuous Skip-gram 41
conversational agents
overview 328, 329, 330, 332, 334, 335, 336, 337, 338
conversational AI applications 330
conversation, with bot
example 331
Convolutional Neural Networks (CNNs)
image processing with 239
key properties 239
pooling 241
regularization, with dropout 242, 243
residual connections 243, 244, 245
count-based vectorization 34
custom CRF Layer
custom CRF model
training, with loss function 94, 95
custom training
implementing 95, 96, 97, 98, 99
D
data
modeling, with Parts-of-Speech (POS) tagging 30, 31
modeling, with stop words removed 24, 25, 26
data locality 232
Decoder model 199, 200, 201, 202
training 202, 203, 204, 205, 206, 207
Dialogflow
agents configuration 333
console access 332
URL 332
domain adaptation 107
domains 107
dropout layer 102
E
embeddings 40
encoder 56
encoder-decoder network 123
Encoder model 194, 195, 196, 197
training 202, 203, 204, 205, 206, 207
encoding 58
F
feature extraction 110
feature extraction model 116, 117, 118, 119, 120
forward pass 101
G
Gap Sentence Generation (GSG) 224
Gated Recurrent Unit (GRU) 49
gated recurrent units (GRUs) 51
gazetteer 73
URL 73
General Attention 125
general conversational agents 343, 344
generative adversarial networks (GANs) 289
Generative Pre-Training (GPT-2) model
about 171, 172, 173, 174, 176, 177
used, for text generation 177, 178, 179, 180, 181, 182, 183
Global Vectors for Word Representation (GloVe) 110
GloVe embeddings 111
used, for creating pre-trained embedding matrix 115, 116
used, for performing IMDb sentiment analysis 110
Google Colab
GPUs
enabling, on Google Colab 7, 8
gradient clipping 49
greedy search
used for improving text summarization 210, 211, 212, 213, 214
used, for text generation 164, 165, 166, 167, 168, 169, 170, 171
Groningen Meaning Bank (GMB) dataset 74
H
Hidden Markov Model (HMM)-based models 25
human-computer interaction (HCI) 46
I
image captioning 232, 233, 234
MS-COCO dataset, using for 235, 236, 237, 238
image feature extraction
performing, with ResNet50 245, 246, 247, 248, 249
image processing
with CNNs 239
with ResNet50 239
IMDb sentiment analysis
improving, with weakly supervised labels 290
performing, with GloVe embeddings 110
IMDb training data
inner workings, of weak supervision
with labeling functions 288, 289, 290
In-Other-Begin (IOB) 77
Inverse Document Frequency 37
K
knowledge base (KB) 340
L
labeled data
collecting 3
development environment setup, for collection of 4, 5, 6
labeling functions 288
Language Model (LM) 128
language models
training cost 172
layer normalization 174
learning rate annealing 159
learning rate decay
about 159
implementing, as custom callback 159, 160, 161, 162, 163, 164
learning rate warmup 160
lemma 32
longest common subsequence (LCS) 222
Long-Short Term Memory (LSTM) 49
cell value 50
forget gate 50
input gate 50
output gate 50
Long Short-Term Memory (LSTM) networks 50, 51
LSTM model
with embeddings 62, 63, 64, 65
M
Machine Learning (ML) project 2
MAchine Reading COmprehension (MARCO) 341
masked language model (MLM) 224
Masked Language Model (MLM) task 131
Max pooling 241
Metric for Evaluation of Translation with Explicit Ordering (METEOR) 221
model
building, steps 234
training 155, 156, 157, 158, 159
morphology 32
MRC conversational agents 340, 341
MS-COCO dataset
used, for image captioning 235, 236, 237, 238
Multi-Head Attention block 130
multi-modal deep learning 228
N
Naïve-Bayes (NB) 306
Naïve-Bayes (NB) model
used, for finding keywords 306, 307, 309, 310, 312, 313
Named Entity Recognition (NER) 72, 73, 74
using, with BiLSTM 89
natural language generation (NLG) 340
Natural Language Processing (NLP) 229
Natural Language Understanding (NLU) 46
natural language understanding (NLU) module 328
NER datasets
URL 73
News Aggregator dataset 151
normalization 55
P
Parts-of-Speech (POS) tagging 26, 27, 28, 29, 30
penalties
coverage normalization 218, 219, 220
decoding, with beam search 218
length normalization 218
performance
performance optimization
with tf.data 61
Porter stemmer 31
prebuilt BERT classification model 139, 140, 141
pre-process IMDb dataset 291, 292, 293, 294
pre-trained embedding matrix
creating, with GloVe embeddings 115, 116
pre-trained GloVe embeddings
pre-training 106
Q
question-answering setting 340
R
ragged tensors 59
re3d
URL 73
Recall-Oriented Understudy for Gisting Evaluation (ROUGE) 221
Recurrent Neural Networks (RNNs) 7
representation learning 40
ResNet50
image feature extraction, performing with 245, 246, 247, 248, 249
image processing with 239
RNN
root morpheme 32
ROUGE metric
ROUGE-L 222
ROUGE-N 221
ROUGE-S 222
ROUGE-W 222
S
self-attention 126
sentence compression 186
sentiment classification, with LSTMs 51, 52
seq2seq model 123
Seq2Seq model
building, with attention layer 193
Seq2Seq model, with attention layer
Bahdanau attention layer 199, 200, 201, 202
Bahdanau attention layer 197, 198
Decoder model 199
Encoder layer 194, 195, 196, 197
Skip-gram Negative Sampling (SGNS) 41
Snorkel
used, for weakly supervised labelling 300, 301, 302, 303, 304
sparse representations 38
Stanford Question Answering Dataset (SQuAD) 3, 341
state-of-the-art approach 224, 225
state-of-the-art models 281, 282
Stochastic Gradient Descent (SGD) 174
stop word removal 20, 21, 22, 23, 24, 25
stride length 240
subject matter experts (SMEs) 287
subword tokenization 132
summaries
T
tasks 107
teacher forcing process 200
temperature 167
Term Frequency - Inverse Document Frequency (TF-IDF) 37, 38
Term Frequency (TF) 37
text generation
character-based approach 150
data normalization 152, 153, 154
data tokenization 152, 153, 154
GPT-2 model, using 177, 178, 179, 180, 181, 182, 183
improving, with greedy search 164, 165, 166, 167, 168, 169, 170, 171
normalized data, modeling 11, 12, 13
stop word removal 20, 21, 23, 24
tokenization 13
text processing workflow 2
stages 2
text normalization 8
text vectorization 33
text summaries
data tokenization 190, 191, 192, 193
data vectorization 190, 191, 192, 193
evaluating 221
generating 207
text summaries, approaches
abstractive summarization 186
extractive summarization 186
count-based vectorization 34
Term Frequency - Inverse Document Frequency (TF-IDF) 37
word vectors 40
TF-IDF features
used, for modeling 39
tokenized data
tokenizer 56
Top-K sampling
used, for text generation 181, 182, 183
transfer learning
considerations 106
overview 106
types 107
transfer learning, types
Transformer architecture 123
Transformer model 125, 249, 250, 251
multi-head attention 253, 254, 255, 256
positional encoding 251, 252, 253
scaled dot-product 253, 254, 255, 256
training, with VisualEncoder 264
VisualEncoder 257, 258, 259, 260
Transformer model, training with VisualEncoder 264
custom learning rate schedule 268, 269
loss function 270
metrics 270
training data, loading 265, 267
U
Universal POS (UPOS) tags 26
unsupervised labels
generating, unlabeled data 319, 320, 322
V
vectorization 55
Visual Commonsense Reasoning (VCR) 230
URL 230
visual grounding 231
Visual Question Answering (VQA) 229
URL 229
Viterbi decoder 99
first word label probability 101, 102, 103
W
weakly supervised data, from Snorkel
BiLSTM baseline model, training on 322, 323, 324
weakly supervised labelling
with Snorkel 300, 301, 302, 303, 304
weakly supervised labels
evaluating, on training data set 314, 315, 316, 317, 318
using, to improve IMDb sentiment analysis 290
weak supervision 286, 287, 288
Windows Subsystem for Linux (WSL) 152
Word2Vec embeddings
using, with pretrained models 42, 43
WordPiece tokenization 132
18.224.214.215