Index

Symbols

VisualEncoder

Transformer model, training with 264

A

abstractive summaries

examples 186, 187

Adaptive Moment Estimation (Adam Optimizer) 119

Attention mechanism 123

Audio-Visual Speech Recognition (AVSR) 228

B

Bahdanau Attention 126

Bahdanau attention layer 197, 198, 199

Batch Normalization (BatchNorm) 245

beam search 171, 180

used, for decoding penalties 218, 219, 220

used for improving text summarization 214, 216, 217

BERT-based transfer learning 123

attention model 125, 127

encoder-decoder networks 123, 124

transformer model 128, 130

BERT fine-tuning approach

for SQuAD question answering 341, 342

bidirectional encoder representations from transformers (BERT) model 132, 133

about 131

custom layers, building 142, 143, 144, 145, 146, 147

normalization 133, 134, 135, 136, 137, 138, 139

sequences 135

tokenization 133, 134, 135, 136, 137, 138, 139

Bi-directional Long Short-Term Memory (BiLSTM) 25

Bi-Directional LSTMs (BiLSTMs) 47

Bilingual Evaluation Understudy (BLEU) 221, 280

BiLSTM baseline model 295

data tokenization 296, 297

data, vectorizing 296, 297

training, on weakly supervised data from Snorkel 322, 323, 324

used, for training 297, 298, 299, 300

BiLSTM model 65, 66, 67, 68, 69

building 83, 84, 85, 86

bottleneck design 244

Byte Pair Encoding (BPE) 117, 132

Byte Pair Encoding (BPE) 26

C

captions

generating 274, 275, 276, 277, 278, 279, 280

cloud-based solutions, for building task-x task-oriented conversational agents

intent identification 339

slot tagging 339

Common Objects in Context (COCO)

URL 235

conda environment

setting up 346

Conditional Random Fields (CRFs)

working 87, 89

Consensus-Based Image Description Evaluation (CIDEr) 280

constructor

parameters 195, 196

context-free vectorization 36

Continuous Bag-of-Words 41

Continuous Skip-gram 41

conversational agents

overview 328, 329, 330, 332, 334, 335, 336, 337, 338

conversational AI applications 330

conversation, with bot

example 331

Convolutional Neural Networks (CNNs)

convolutions 240, 241

image processing with 239

key properties 239

pooling 241

regularization, with dropout 242, 243

residual connections 243, 244, 245

ResNets 243, 244, 245

count-based vectorization 34

modeling after 35, 36

custom CRF Layer

implementing 91, 92

custom CRF model

implementing 93, 94

training, with loss function 94, 95

custom training

implementing 95, 96, 97, 98, 99

D

data

loading 75, 76, 77, 78, 79

modeling, with Parts-of-Speech (POS) tagging 30, 31

modeling, with stop words removed 24, 25, 26

normalizing 80, 81, 82, 83

vectorizing 80, 81, 82, 83

data locality 232

Decoder model 199, 200, 201, 202

training 202, 203, 204, 205, 206, 207

Dialogflow

agents configuration 333

console access 332

URL 332

domain adaptation 107

domains 107

dropout layer 102

E

embeddings 40

encoder 56

encoder-decoder network 123

Encoder model 194, 195, 196, 197

training 202, 203, 204, 205, 206, 207

encoding 58

F

feature extraction 110

feature extraction model 116, 117, 118, 119, 120

creating 121, 122

forward pass 101

G

Gap Sentence Generation (GSG) 224

Gated Recurrent Unit (GRU) 49

gated recurrent units (GRUs) 51

gazetteer 73

URL 73

General Attention 125

general conversational agents 343, 344

generative adversarial networks (GANs) 289

Generative Pre-Training (GPT-2) model

about 171, 172, 173, 174, 176, 177

used, for text generation 177, 178, 179, 180, 181, 182, 183

Global Vectors for Word Representation (GloVe) 110

GloVe embeddings 111

used, for creating pre-trained embedding matrix 115, 116

used, for performing IMDb sentiment analysis 110

Google Colab

GPUs, enabling on 7, 8

GPUs

enabling, on Google Colab 7, 8

gradient clipping 49

greedy search

used for improving text summarization 210, 211, 212, 213, 214

used, for text generation 164, 165, 166, 167, 168, 169, 170, 171

Groningen Meaning Bank (GMB) dataset 74

H

Hidden Markov Model (HMM)-based models 25

human-computer interaction (HCI) 46

I

image captioning 232, 233, 234

MS-COCO dataset, using for 235, 236, 237, 238

image feature extraction

performing, with ResNet50 245, 246, 247, 248, 249

image processing

with CNNs 239

with ResNet50 239

IMDb sentiment analysis

improving, with weakly supervised labels 290

performing, with GloVe embeddings 110

IMDb training data

loading 112, 113, 114

inner workings, of weak supervision

with labeling functions 288, 289, 290

In-Other-Begin (IOB) 77

Inverse Document Frequency 37

K

knowledge base (KB) 340

L

labeled data

collecting 3

development environment setup, for collection of 4, 5, 6

labeling functions 288

iterating on 304, 305, 306

Language Model (LM) 128

language models

training cost 172

layer normalization 174

learning rate annealing 159

learning rate decay

about 159

implementing, as custom callback 159, 160, 161, 162, 163, 164

learning rate warmup 160

lemma 32

lemmatization 31, 32, 33

longest common subsequence (LCS) 222

Long-Short Term Memory (LSTM) 49

cell value 50

forget gate 50

input gate 50

output gate 50

Long Short-Term Memory (LSTM) networks 50, 51

LSTM model

with embeddings 62, 63, 64, 65

M

Machine Learning (ML) project 2

MAchine Reading COmprehension (MARCO) 341

masked language model (MLM) 224

Masked Language Model (MLM) task 131

Max pooling 241

Metric for Evaluation of Translation with Explicit Ordering (METEOR) 221

model

building, steps 234

training 155, 156, 157, 158, 159

morphology 32

MRC conversational agents 340, 341

MS-COCO dataset

used, for image captioning 235, 236, 237, 238

Multi-Head Attention block 130

multi-modal deep learning 228

language tasks 229, 230, 231

vision 229, 230, 231

multi-task learning 108, 109

N

Naïve-Bayes (NB) 306

Naïve-Bayes (NB) model

used, for finding keywords 306, 307, 309, 310, 312, 313

Named Entity Recognition (NER) 72, 73, 74

GMB dataset 74, 75

using, with BiLSTM 89

using, with CRFs 89, 90

natural language generation (NLG) 340

Natural Language Processing (NLP) 229

Natural Language Understanding (NLU) 46

natural language understanding (NLU) module 328

NER datasets

URL 73

News Aggregator dataset 151

normalization 55

P

padding 58, 59, 60

Parts-of-Speech (POS) tagging 26, 27, 28, 29, 30

data, modeling with 30, 31

penalties

coverage normalization 218, 219, 220

decoding, with beam search 218

length normalization 218

performance

improving 281, 282

performance optimization

with tf.data 61

Porter stemmer 31

prebuilt BERT classification model 139, 140, 141

pre-process IMDb dataset 291, 292, 293, 294

pre-trained embedding matrix

creating, with GloVe embeddings 115, 116

pre-trained GloVe embeddings

loading 114, 115

pre-training 106

Q

question-answering setting 340

R

ragged tensors 59

re3d

URL 73

Recall-Oriented Understudy for Gisting Evaluation (ROUGE) 221

Recurrent Neural Networks (RNNs) 7

representation learning 40

ResNet50

image feature extraction, performing with 245, 246, 247, 248, 249

image processing with 239

RNN

building blocks 48, 49

root morpheme 32

ROUGE metric

evaluating 221, 222, 224

ROUGE-L 222

ROUGE-N 221

ROUGE-S 222

ROUGE-W 222

S

segmentation 13, 15

in Japanese 13, 15, 17, 18

self-attention 126

sentence compression 186

sentiment classification, with LSTMs 51, 52

data, loading 52, 53, 55

seq2seq model 123

Seq2Seq model

building, with attention layer 193

Seq2Seq model, with attention layer

Bahdanau attention layer 199, 200, 201, 202

Bahdanau attention layer 197, 198

building 193, 194

Decoder model 199

Encoder layer 194, 195, 196, 197

sequential learning 109, 110

Skip-gram Negative Sampling (SGNS) 41

Snorkel

used, for weakly supervised labelling 300, 301, 302, 303, 304

sparse representations 38

Stanford Question Answering Dataset (SQuAD) 3, 341

state-of-the-art approach 224, 225

state-of-the-art models 281, 282

stemming 31, 32, 33

Stochastic Gradient Descent (SGD) 174

stop word removal 20, 21, 22, 23, 24, 25

stride length 240

subject matter experts (SMEs) 287

subword tokenization 132

subword tokenizer 294, 295

summaries

generating 208, 209, 210

T

tasks 107

teacher forcing process 200

temperature 167

Term Frequency - Inverse Document Frequency (TF-IDF) 37, 38

Term Frequency (TF) 37

text generation

character-based approach 150

data loading 151, 152

data normalization 152, 153, 154

data pre-processing 151, 152

data tokenization 152, 153, 154

GPT-2 model, using 177, 178, 179, 180, 181, 182, 183

improving, with greedy search 164, 165, 166, 167, 168, 169, 170, 171

text normalization 8, 9, 10

normalized data, modeling 11, 12, 13

stop word removal 20, 21, 23, 24

tokenization 13

text processing workflow 2

data collection 2, 3

data labeling 2, 3

stages 2

text normalization 8

text vectorization 33

text summaries

data loading 188, 189

data pre-processing 188, 189

data tokenization 190, 191, 192, 193

data vectorization 190, 191, 192, 193

evaluating 221

generating 207

overview 186, 188

text summaries, approaches

abstractive summarization 186

extractive summarization 186

text vectorization 33, 34

count-based vectorization 34

Term Frequency - Inverse Document Frequency (TF-IDF) 37

word vectors 40

TF-IDF features

used, for modeling 39

tokenization 55, 58

tokenized data

modeling 19, 20

tokenizer 56

Top-K sampling

used, for text generation 181, 182, 183

transfer learning

considerations 106

overview 106

types 107

transfer learning, types

domain adaptation 107, 108

multi-task learning 108, 109

sequential learning 109, 110

Transformer architecture 123

Transformer model 125, 249, 250, 251

creating 263, 264

Decoder 260, 261, 262, 263

masks 251, 252, 253

multi-head attention 253, 254, 255, 256

positional encoding 251, 252, 253

scaled dot-product 253, 254, 255, 256

training, with VisualEncoder 264

VisualEncoder 257, 258, 259, 260

Transformer model, training with VisualEncoder 264

checkpoints 270, 271

custom learning rate schedule 268, 269

custom training 272, 273, 274

instantiating 267, 268

loss function 270

masks 270, 271

metrics 270

training data, loading 265, 267

U

Universal POS (UPOS) tags 26

unsupervised labels

generating, unlabeled data 319, 320, 322

V

vectorization 55

Visual Commonsense Reasoning (VCR) 230

URL 230

visual grounding 231

Visual Question Answering (VQA) 229

URL 229

Viterbi decoder 99

Viterbi decoding 99, 100

first word label probability 101, 102, 103

W

weakly supervised data, from Snorkel

BiLSTM baseline model, training on 322, 323, 324

weakly supervised labelling

with Snorkel 300, 301, 302, 303, 304

weakly supervised labels

evaluating, on training data set 314, 315, 316, 317, 318

using, to improve IMDb sentiment analysis 290

weak supervision 286, 287, 288

Windows Subsystem for Linux (WSL) 152

Word2Vec embeddings

using, with pretrained models 42, 43

WordPiece tokenization 132

word vectors 40, 41

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.214.215