Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

index

Numerics

1D convolutional model 314–315

A

ablation studies 272
abstraction
- abstract analogies 497
- two poles of 498, 501
  - cognition as combination of both kinds of abstraction 500–501
- program-centric analogy 499–500
- value-centric analogy 498–499
Abstraction and Reasoning Corpus (ARC) 495
activations
- CAM (class activation map) visualization 283–289
- visualizing heatmaps of class 295–299
adapt() method 342
adversarial examples 486
adversarial network 448–452
AGI (artificial general intelligence) 508
AI (artificial intelligence)
- deep learning and 2–3
- promise of 12–13
- setting course toward greater generality in 493–496
  - new target 495–496
  - shortcut rule 493–495
- various approaches to 475
AI summer 476
algorithmic modules 508
algorithms 22
all_dims() object 79
ambiguous features 133
Analytical Engine 3
append() method 512
ARC (Abstraction and Reasoning Corpus) 495
architecture patterns 269–282
- batch normalization 275–278
- depthwise separable convolutions 278–280
- MHR (modularity, hierarchy, and reuse) formula 269
- mini Xception-like model 280–282
- residual connections 272–275
architecture priors 149–150
array objects 31
array_reshape() function 29
artificial general intelligence (AGI) 508
artificial intelligence. See AI
arXiv 509
as.data.frame() method 112
assign method 82
automated machine learning 462
automatic differentiation with computation graphs 55–58
automatic shape inference 91–93
automatons, intelligent agents vs. 488

B

backpropagation algorithm 9, 54–59
- automatic differentiation with computation graphs 55–58
- chain rule 54–55
- gradient tape in TensorFlow 58–59
bag-of-words approach 347–354
- bigrams with binary encoding 350–351
- bigrams with TF-IDF encoding 352–354
- single words (unigrams) with binary encoding 347–350
- when to use sequence models over 381–382
bag-of-words models 338
baseline, beating 176
Basic Linear Algebra Subprograms (BLAS) 39
batch generator 63
batch_size argument 357
BatchNormalization layers 276
best practices
- getting most out of models 455–464
  - hyperparameter optimization 455–462
  - model ensembling 462–464
- scaling-up model training 464–472
  - multi-GPU training 468–471
  - speeding up training on GPU with mixed precision 465–467
  - TPU training 471–472
bias vector 277
bidirectional layer 331
bidirectional RNNs 329–332
bigrams
- with binary encoding 350–351
- with TF-IDF encoding 352–354
binary classification example 105–114
- building model 108–110
- IMDB dataset 105–106
- preparing data 107
- using trained model to generate predictions on new data 113
- validating approach 110–113
binary encoding 350–351
binary_crossentropy loss function 110
BLAS (Basic Linear Algebra Subprograms) 39
border effects 226–227
Boston housing price dataset 122
broadcasting 40–41, 79
browser, deploying model in 181–182
build() method 90, 187, 457

C

call() method 61, 90, 190, 374
Callback class 205
callbacks 204–205
- early stopping callbacks 204–205
- text-generation callback with variable-temperature sampling 408–413
- writing 205–207
CAM (class activation map) visualization 295–299
categorical encoding 116
causal padding 394
CelebA dataset 445–447
chain rule 54–55
channels-first convention 37
channels-last convention 37
character-level tokenization 338
class objects 522
class statement 522
classes 522–526
- iterators 525–526
- underscores 523–525
classification
- binary classification example 105–114
  - building model 108–110
  - IMDB dataset 105–106
  - preparing data 107
  - using trained model to generate predictions on new data 113
  - validating approach 110–113
- multiclass classification example 114–121
  - building model 116–117
  - generating predictions on new data 119–120
  - handling labels and loss 120
  - large intermediate layers, importance of 120–121
  - preparing data 115–116
  - Reuters dataset 114–115
  - validating approach 117–119
classname argument 375
cognition 500–501
cognitive automation 493
combinatorial explosion 505
common-sense baselines 145–146
compilation step 29
compile step, Keras APIs 95–98
compile() method 95
computation graphs, automatic differentiation with 55–58
compute_dtype property 467
compute_mask() method 361
computer vision
- convnets (convolutional neural networks) 221–230
  - convolution operation 223–227
  - max-pooling operation 228–230
- essential computer vision tasks 259–260
- image segmentation example 260–269
- interpreting what convnets learn 283–299
  - visualizing convnet filters 289–295
  - visualizing heatmaps of class activation 295–299
  - visualizing intermediate activations 283–289
- modern convnet architecture patterns 269–282
  - batch normalization 275–278
  - depthwise separable convolutions 278–280
  - MHR (modularity, hierarchy, and reuse) formula 269
  - mini Xception-like model 280–282
  - residual connections 272–275
  - pretrained models 245–257
    - feature extraction with pretrained model 246–253
    - fine-tuning pretrained model 254–257
  - training convnet on small dataset 230–245
    - building model 234–235
    - data preprocessing 235–241
    - downloading data 231–233
    - relevance of deep learning for small data problems 230–231
    - using data augmentation 241–245
concept drift 171
concept vectors 433
constant tensors 81–82
container types 512–517
- dictionaries 516–517
- lists 512–514
- sets 517
- tuples 514–516
content loss 423–424
context management 532
conv_base model 249
Conv1D layers 481
Conv2D layers 222, 266, 481
Conv2DTranspose layers 266
Conv3D layers 481
convnets (convolutional neural networks) 221–230
- architecture patterns 269–282
  - batch normalization 275–278
  - depthwise separable convolutions 278–280
  - MHR (modularity, hierarchy, and reuse) formula 269
  - mini Xception-like model 280–282
  - residual connections 272–275
- convolution operation 223–227
  - border effects and padding 226–227
  - convolution strides 227
- interpretability 283–299
  - visualizing convnet filters 289–295
  - visualizing heatmaps of class activation 295–299
  - visualizing intermediate activations 283–289
- max-pooling operation 228–230
- overview 481
- training on small dataset 230–245
  - building model 234–235
  - data preprocessing 235–241
  - downloading data 231–233
  - relevance of deep learning for small data problems 230–231
  - using data augmentation 241–245
convolution kernel 225
convolution operation 223–227
- border effects and padding 226–227
- convolution strides 227
convolutional base 246
convolutional neural networks 221
cost function 9
cross-entropy 110
CUDA 73–74
cuDNN 73–74
cuDNN kernel 279

D

data
- collecting 169–173
  - beware of nonrepresentative data 171–173
  - investing in data annotation infrastructure 170–171
- convnets (convolutional neural networks)
  - downloading 231–233
  - preprocessing 235–241
- driving advances of deep learning 21–22
- exploring 173
- learning rules and representations from 4–7
- metric for success 173–174
- neural networks, data representations for 31–37
  - data batches 35
  - image data 36–37
  - key attributes 33–34
  - manipulating tensors in R 34
  - matrices (rank 2 tensors) 32
  - rank 3 and higher-rank tensors 32–33
  - real-world examples of data tensors 35
  - scalars (rank 0 tensors) 31
  - time-series data or sequence data 36
  - vector data 35–36
  - vectors (rank 1 tensors) 31–32
  - video data 37
- preparing 174–175
  - handling missing values 175
  - value normalization 174–175
  - vectorization 174
data augmentation 230, 241–245
- fast feature extraction without 249–251
- feature extraction together with 251–253
data distillation 29
data parallelism 468
data vectorization 174
dataset curation 152–153
Dataset object 100
dataset_map() method 238
DCGAN (deep convolutional GAN) 443
decision trees 15–16
decoder network 440
decorators 531–532
deep learning 2–13, 72–74
- “deep” in “deep learning” 7–8
- achievements 10–11
- AI (artificial intelligence) 2–3, 12–13
- computer vision
  - convnets (convolutional neural networks) 221–230
  - essential computer vision tasks 259–260
  - image segmentation example 260–269
  - interpreting what convnets learn 283–299
  - modern convnet architecture patterns 269–282
  - pretrained models 245–257
  - training convnet on small dataset 230–245
- for time series
  - different kinds of time-series tasks 301–302
  - RNNs (recurrent neural networks) 317–333
  - temperature-forecasting example 302–316
- forces driving advances of 20–25
  - algorithms 22
  - data 21–22
  - democratization of deep learning 23–24
  - hardware 20–21
  - lasting potential 24–25
  - wave of investment 22–23
- future of 502–508
  - blending together deep learning and program synthesis 504–506
  - lifelong learning and modular subroutine reuse 506–507
  - long-term vision 507–508
  - machine learning vs. program synthesis 504
  - models as programs 503
- generalization in 136–142
  - interpolation as source of generalization 139
  - manifold hypothesis 137–138
  - training data 141–142
  - why deep learning works 139–141
- geometric interpretation of operations 47–48
- how to think about 476–477
- installing Keras and TensorFlow 73–74
- learning rules and representations from data 4–7
- limitations of 485–493
  - automatons vs. intelligent agents 488
  - climbing spectrum of generalization 492–493
  - local generalization vs. extreme generalization 489–491
  - purpose of intelligence 491–492
  - risk of anthropomorphizing machine learning models 486–488
- machine learning 3–4, 13–19, 475–476
  - back to neural networks 16–17
  - decision trees, random forests, and gradient-boosting machines 15–16
  - deep learning different, reasons for making 17
  - early neural networks 13–14
  - kernel methods 14–15
  - modern machine learning landscape 17–19
  - probabilistic modeling 13
- NLP (natural language processing)
  - overview 334–336
  - preparing text data 336–344
  - representing groups of words 344–366
  - sequence-to-sequence learning 382–398
  - Transformer architecture 366–382
- overview 8–10
- short-term expectations 11–12
DeepDream 414–421
def statement 520–521
democratization of deep learning 23–24
Dense class 61–62
Dense layers 29, 222, 479
densely connected layers 89
densely connected networks 479–481
deploying model 178–183
- explaining work to stakeholders and setting expectations 178–179
- maintaining 183
- monitoring 182–183
- shipping inference model 179–182
  - deploying model as REST API 179–180
  - deploying model in browser 181–182
  - deploying model on device 180–181
  - inference model optimization 182
depthwise separable convolution layer 278
depthwise separable convolutions 278–280
derivatives
- chaining 54–59
  - automatic differentiation with computation graphs 55–58
  - chain rule 54–55
  - gradient tape in TensorFlow 58–59
- overview 49–50
developing model 174–178
- beating baseline 176
- choosing evaluation protocol 175–176
- overfitting 177
- preparing data 174–175
  - handling missing values 175
  - value normalization 174–175
  - vectorization 174
- regularizing and tuning model 178
device, deploying model on 180–181
dictionaries 516–517
dim() function 29, 75
dimensionality 32
discriminator network 444
discriminators 447
double array 29
download.file() utilities 261
dropout
- adding 161–164
- using recurrent dropout to fight overfitting 324–327
dunder 523

E

early stopping 154
efficiency ratio 495
element-wise operations 38–39
embedding layer 359–361
embedding_dim-dimensional vector 364
evaluate() method 100, 198
evaluation loops
- using built-in 201–209
  - monitoring and visualization with TensorBoard 208–209
  - using callbacks 204–205
  - writing own callbacks 205–207
  - writing own metrics 202–203
- writing 210–218
  - complete training and evaluation loop 212–214
  - fit() with custom training loop 216–218
  - low-level usage of metrics 211–212
  - tf_function() 215–216
  - training versus inference 210–211
evaluation protocol 175–176
expert systems 3
extreme generalization 489–491, 508

F

failure modes 179
feature engineering 17, 153–154
feature extraction 196
feature extractor model 290
feature maps 224
features
- ambiguous features 133
- extraction with pretrained models 246–253
  - fast feature extraction without data augmentation 249–251
  - feature extraction together with data augmentation 251–253
- rare features and spurious correlations 133–136
feed-forward networks 317
fg function 55
filters 225
filters, visualizing convnet 289–295
fine-tuning pretrained models 254–257
fit() method 29, 95, 99, 198, 216–218, 236, 467
five-dimensional vector 32
Flatten layer 222
Flatten operation 481
float16 weights 465
float32 data 174
float32 inputs 465
float32 tensor 296
float32 value 466
float32 weight variables 465
float64 tensor 467
floating-point numbers 530
floating-point precision 465–467
for loop 38, 199, 319, 458, 504
for statement 518–520
for() loop 410
Fourier transform 302
framing machine learning problem 168–169
full training loop 65
Functional API 189–196
- access to layer connectivity 194–196
- multi-input, multi-output models 191–192
- simple example 190–191
- training multi-input, multi-output model 192–193

G

GANs (generative adversarial networks) 432, 442–452
- adversarial network 448–452
- CelebA dataset 445–447
- discriminators 447
- generators 447–448
- schematic GAN implementation 443–444
- tricks 444–445
Gated Recurrent Unit (GRU) layers 328
GCS (Google Cloud Storage) 180, 472
generalization 130–142
- climbing spectrum of 492–493
- improving 152–164
  - dataset curation 152–153
  - feature engineering 153–154
  - regularizing model 155–164
  - using early stopping 154
- in deep learning 136–142
  - interpolation as source of generalization 139
  - manifold hypothesis 137–138
  - training data 141–142
  - why deep learning works 139–141
- local generalization vs.?extreme generalization 489–491
- underfitting and overfitting 131–136
  - ambiguous features 133
  - noisy training data 132
  - rare features and spurious correlations 133–136
generalized self-attention 370–371
generative deep learning
- DeepDream 414–421
- GANs (generative adversarial networks) 442–452
  - adversarial network 448–452
  - CelebA dataset 445–447
  - discriminators 447
  - generators 447–448
  - schematic GAN implementation 443–444
  - tricks 444–445
- image generation with VAEs (variational autoencoders) 431–442
  - concept vectors for image editing 433
  - implementing VAE with Keras 436–441
  - sampling from latent spaces of images 432
  - variational autoencoders 434–436
- neural style transfer 422–431
  - content loss 423–424
  - neural style transfer in Keras 424–431
  - style loss 424
- text generation 401–413
  - generating sequence data 402
  - history of generative deep learning for sequence generation 401–402
  - implementing text generation with Keras 404–408
  - sampling strategy, importance of 402–404
  - text-generation callback with variable-temperature sampling 408–413
generator network 444
generator object 526
generators
- defining with yield 526–527
- developing model 447–448
geometric interpretation
- of deep learning operations 47–48
- of tensor operations 44–47
geometric modules 508
get_config() method 374–375
gradient descent 49, 289
gradient-based optimization 48–59
- backpropagation algorithm 54–59
  - automatic differentiation with computation graphs 55–58
  - chain rule 54–55
  - gradient tape in TensorFlow 58–59
- derivatives 49–50
- gradients 50–51
- stochastic gradient descent 51–54
- tuning gradient descent parameters 147–149
gradient-boosting machines 15–16
gradients 50–51
GradientTape API 83–84
GradientTape object 63, 289
GradientTape scope 291
Gram matrix 424
greedy sampling 402
GRU (Gated Recurrent Unit) layers 328

H

hardware 20–21
hash() method 516
heatmaps of class activation 295–299
history object 99, 111
holdout validation 143–144
HSV (hue-saturation-value) format 5
HyperModel class 457
hyperparameter optimization 455–462
- automated machine learning 462
- crafting right search space 461
- using KerasTuner 456–461
hypothesis space 6, 109, 269

I

image classification 259
image data 36–37
image generation 431–442
- concept vectors for image editing 433
- implementing VAE with Keras 436–441
- sampling from latent spaces of images 432
- VAEs (variational autoencoders) 434–436
image segmentation example 260–269
image_dataset_from_directory() function 236
imagenet_preprocess_input() function 249
IMDB dataset 105–106, 345–347
import statement 527–529
include_top function 247
increasing model capacity 150–152
inference 101
- model optimization 182
- shipping inference model 179–182
  - deploying model as REST API 179–180
  - deploying model in browser 181–182
  - deploying model on device 180–181
  - inference model optimization 182
- training versus 210–211
information arbitrage 333
information distillation pipeline 288
information-distillation process 7
initialize() method 197
input_shape function 247
inputs object 190
instance segmentation 260
int32 tensors 347
integer type 76
integers 530
intelligence 496–502
- as sensitivity to abstract analogies 497
- missing half of picture 501–502
- purpose of 491–492
- two poles of abstraction 498, 501
  - cognition as combination of both kinds of abstraction 500–501
  - program-centric analogy 499–500
  - value-centric analogy 498–499
intelligent agents 488
intermediate layers 120–121
interpolation 139
interpretability 283–299
- visualizing convnet filters 289–295
- visualizing heatmaps of class activation 295–299
- visualizing intermediate activations 283–289
investment 22–23
iter method 525
iterations
- general discussion of 527
- iterated K-fold validation with shuffling 145
- with for statement 518–520

K

K-fold validation 124–128, 144–145
Kaggle 509
kaggle package 231
Keras
- building Keras models 186–200
  - Functional API 189–196
  - mixing and matching different components 199–200
  - Sequential model 187–189
  - subclassing model class 196–199
  - using right tool for job 200
- exploring ecosystem 509–510
- history with TensorFlow 71
- implementing DeepDream in 415–421
- implementing VAEs with 436–441
- installing 73–74
- neural style transfer in 424–431
- overview 69–70
- recurrent layer in 320–324
- text generation with 404–408
  - preparing data 404–406
  - Transformer-based sequence-to-sequence model 406–408
- using built-in training and evaluation loops 201–209
  - monitoring and visualization with
    - TensorBoard 208–209
  - using callbacks 204–205
  - writing own callbacks 205–207
  - writing own metrics 202–203
- workflows 186
- writing training and evaluation loops 210–218
  - complete training and evaluation loop 212–214
  - fit() with custom training loop 216–218
  - low-level usage of metrics 211–212
  - tf_function() 215–216
  - training versus inference 210–211
- Keras APIs 89–101
  - compile step 95–98
  - fit() method 99
  - inference 101
  - layers 89–93
    - automatic shape inference 91–93
    - composing layers with %>% (pipe operator) 93
    - layer class 90–91
  - models 94–95
  - monitoring loss and metrics on validation data 99–100
    - picking loss function 98
- keras_model() constructor 191
- KerasTuner 456–461
- kernel function 15
- kernel methods 14–15
- kernel trick 14
- keyword arguments 521

L

labels 120
Large Hadron Collider (LHC) 16
latent spaces 432
Layer class 196
layer_conv_1d layer 314
layer_conv_2d layer 221
layer_conv_3d layer 314
layer_embedding layer 363
layer_gru layers 325
layer_lstm layers 325
layer_max_pooling_2d layer 221
layer_multi_head_attention layer 371
layer_settings vector 421
layer_simple_dense() layer 92
layer_text_vectorization 340–344
layer_text_vectorization layer 347
layers, Keras APIs 89–93
- automatic shape inference 91–93
- composing layers with %>% (pipe operator) 93
- layer class 90–91
learning_rate argument 96
learning_rate factor 52
length() function 75
LHC (Large Hadron Collider) 16
linear classifier in TensorFlow 84–89
lists 512–514
local generalization 489–491
log() function 75
logistic regression 13
logs argument 206
loss function
- multiclass classification handling 120
- picking 98
LSTM (long short-term memory) 20, 316, 399

M

machine learning 13–19
- back to neural networks 16–17
- decision trees 15–16
- deep learning and 3–4, 17, 475–476
- defining task 168–174
  - collecting dataset 169–173
  - exploring data 173
  - framing problem 168–169
  - metric for success 173–174
- deploying model 178–183
  - explaining work to stakeholders and setting expectations 178–179
  - maintaining 183
  - monitoring 182–183
  - shipping inference model 179–182
- developing model 174–178
  - beating baseline 176
  - choosing evaluation protocol 175–176
  - overfitting 177
  - preparing data 174–175
  - regularizing and tuning model 178
- early neural networks 13–14
- evaluating machine learning models 142–146
  - common-sense baselines 145–146
  - model evaluation protocol 146
  - training, validation, and test sets 142–145
- generalization 130–142
  - improving 152–164
  - in deep learning 136–142
  - underfitting and overfitting 131–136
- gradient-boosting machines 15–16
- improving model fit 146–152
  - increasing model capacity 150–152
  - leveraging architecture priors 149–150
  - tuning gradient descent parameters 147–149
- kernel methods 14–15
- modern machine learning landscape 17–19
- probabilistic modeling 13
- program synthesis vs. 504
- random forests 15–16
- risk of anthropomorphizing 486–488
- universal machine learning workflow 478–479
MAE (mean absolute error) 124, 310
maintaining model 183
manifold hypothesis 137–138
map_func argument 348
masking 361–363
matrices (rank 2 tensors) 32
max tensor operation 228
max-pooling operation 228–230
MaxPooling2D layers 222, 266
mean squared error (MSE) 123, 312, 481
Metric instances 216
metrics
- low-level usage of 211–212
- writing 202–203
metrics active property 216
metrics property 99
MHR (modularity, hierarchy, and reuse) formula 269
mini Xception-like model 280–282
mini-batch SGD (stochastic gradient descent) 52
MirroredStrategy object 469
missing values 175
mixed precision 465–467
- floating-point precision 465–467
- in practice 467
model checkpoint callbacks 204–205
model ensembling 462–464
model parallelism 468
model subclassing 186
model$layers model property 195
modules 527–529
monitoring model 182–183
MSE (mean squared error) 123, 312, 481
mse loss function 113
multi-GPU training 468–471
- single-host, multidevice synchronous training 468–471
- two or more GPUs 468
multi-head attention 371–372
multi-input, multi-output models
- overview 191–192
- training 192–193
multiclass classification example 114–121
- building model 116–117
- generating predictions on new data 119–120
- handling labels and loss 120
- large intermediate layers, importance of 120–121
- preparing data 115–116
- Reuters dataset 114–115
- validating approach 117–119
MultiHeadAttention layers 394
multilabel categorical classification 480

N

N-gram tokenization 338
NaiveDense class 62
NaiveSequential class 62
natural language processing. See NLP
network architectures 479–483
- convnets 481
- densely connected networks 479–481
- RNNs 482
- Transformers 482–483
network size 155–158
neural networks 7, 477
- advancements of 16–17
- binary classification example 105–114
  - building model 108–110
  - IMDB dataset 105–106
  - preparing data 107
  - using trained model to generate predictions on new data 113
  - validating approach 110–113
- data representations for 31–37
  - data batches 35
  - image data 36–37
  - key attributes 33–34
  - manipulating tensors in R 34
  - matrices (rank 2 tensors) 32
  - rank 3 and higher-rank tensors 32–33
  - real-world examples of data tensors 35
  - scalars (rank 0 tensors) 31
  - time-series data or sequence data 36
  - vector data 35–36
  - vectors (rank 1 tensors) 31–32
  - video data 37
- early iterations of 13–14
- example of 27–31, 59–66
  - evaluating model 66
  - full training loop 65
  - reimplementing from scratch in TensorFlow 61–63
  - running one training step 63–65
- gradient-based optimization 48–59
  - backpropagation algorithm 54–59
  - derivatives 49–50
  - gradients 50–51
  - stochastic gradient descent 51–54
- multiclass classification example 114–121
  - building model 116–117
  - generating predictions on new data 119–120
  - handling labels and loss 120
  - large intermediate layers, importance of 120–121
  - preparing data 115–116
  - Reuters dataset 114–115
  - validating approach 117–119
- regression example 121–128
  - Boston housing price dataset 122
  - building model 123–124
  - generating predictions on new data 128
  - preparing data 122–123
  - validating approach using K-fold validation 124–128
- tensor operations 37–48
  - broadcasting 40–41
  - element-wise operations 38–39
  - geometric interpretation of 44–48
  - tensor product 41–43
  - tensor reshaping 43–44
neural style transfer 422–431
- content loss 423–424
- neural style transfer in Keras 424–431
- style loss 424
new_layer_class() function 90
next method 525
ngrams = N argument 350
NLP (natural language processing) overview 334–336
- preparing text data 336–344
  - text splitting (tokenization) 338–339
  - text standardization 337–338
  - using layer_text_vectorization 340–344
  - vocabulary indexing 339–340
- representing groups of words 344–366
  - bag-of-words approach 347–354
  - preparing IMDB movie reviews data 345–347
  - sequence model approach 355–366
- sequence-to-sequence learning 382–398
  - machine translation example 383–387
  - with RNNs 387–392
  - with Transformer 392–398
- Transformer architecture 366–382
  - multi-head attention 371–372
  - self-attention 366–371
  - Transformer encoder 372–381
  - when to use sequence models over bag-of-words models 381–382
noisy training data 132
normalization 174–175, 275–278

O

object detection 260
objective function 9
Occam’s razor 159
on_batch_* method 206
on_epoch_* method 206
one-hot encoding 116
optimization 131
Optimizer instance 64
output feature map 224
output_mode argument 353
overfitting 31, 131–136
- ambiguous features 133
- developing model 177
- noisy training data 132
- rare features and spurious correlations 133–136
- using recurrent dropout to fight 324–327

P

pack arguments 521
packing tuples 515–516
padding 226–227, 361–363
patience value 460
pipe operator (%>%) 93
plot() function 194, 262
plot() method 111
positional encoding 378–379
PositionalEmbedding layers 396, 483
POSIXct format 304
predict() loops 291
predict() method 113, 198, 249
predictions
- binary classification example 113
- multiclass classification example 119–120
- regression 128
pretrained models 245–257
- feature extraction with 246–253
  - fast feature extraction without data augmentation 249–251
  - feature extraction together with data augmentation 251–253
- fine-tuning 254–257
pretrained word embeddings 359, 363–366
print() method 188, 320
probabilistic modeling 13
program synthesis 503
- blending together deep learning and 504–506
  - integrating deep learning modules and algorithmic modules into hybrid systems 504–505
  - using deep learning to guide program search 505–506
- machine learning vs. 504
program-centric analogy 499–500
program-space intuition 508
programs, models as 503
progressive disclosure of complexity 186
[:punct:] class 385
Python
- for R users 519–520
  - container types 512–517
- decorators 531–532
- defining classes with class 522–526
- defining functions with def 520–521
- defining generators with yield 526–527
- import and modules 527–529
- integers and floats 530
- iteration closing remarks 527
- iteration with for 518–520
- R vectors 530–531
- whitespace 511
- with and context management 532
R interfaces and 71–72
query-key-value model 370–371

Q

query-key-value model 370–371

R

manipulating tensors in 34
Python for
- container types 512–517
- decorators 531–532
- defining classes with class 522–526
- defining functions with def 520–521
- defining generators with yield 526–527
- import and modules 527–529
- integers and floats 530
- interfaces from Python 71–72
- iteration closing remarks 527
- iteration with for 518–520
- R vectors 530–531
- whitespace 511
- with and context management 532
random forests 15–16
randomized A/B testing 182
rank 3 tensors 32–33
raster object 299
reconstruction loss 436
recurrent neural networks. See RNNs
regression 121–128
- Boston housing price dataset 122
- building model 123–124
- generating predictions on new data 128
- preparing data 122–123
- validating approach using K-fold validation 124–128
regularization loss 436
regularizing model 155–164, 178
- adding dropout 161–164
- adding weight regularization 159–161
- reducing network size 155–158
relu (rectified linear unit) 109
relu activation 277
relu operation 38
reset_state() method 203
reshaping tensors 77–78
residual connections 272–275
response map 225
REST API 179–180
result() method 203
return_sequences argument 320
Reuters dataset 114–115
RGB (red-green-blue) format 5
RMSE (root mean squared error) 202
RMSprop optimizer 332
rmsprop optimizer 60, 110
RNNs (recurrent neural networks) 317–333
- overview 482
- recurrent layer in Keras 320–324
- stacking recurrent layers 327–329
- using bidirectional RNNs 329–332
- using recurrent dropout to fight overfitting 324–327
ROC (receiver operating characteristic) 173
rtuning model 178

S

samples axis 35
sampling bias 172
sampling strategy 402–404
scalars (rank 0
tensors) 31
scale() function 123, 175
scaling-up model training 464–472
- multi-GPU training 468–471
  - single-host, multidevice synchronous training 468–471
  - two or more GPUs 468
- speeding up training on GPU with mixed
  - precision 465–467
  - floating-point precision 465–467
  - in practice 467
- TPU training 471–472
schematic GAN implementation 443–444
second-order gradients 84
segmentation mask 260
self-attention 366–371
semantic segmentation 260
SeparableConv2D layers 290, 481
sequence generation
- data for 402
- history of generative deep learning for 401–402
sequence model approach 355–366
- learning word embeddings with the embedding layer 359–361
- padding and masking 361–363
- practical example 355–356
- using pretrained word embeddings 363–366
- when to use bag-of-words approach over 381–382
- word embeddings 357–359
sequence-to-sequence learning 382–398, 482
- machine translation example 383–387
- with RNNs 387–392
- with Transformer 392–398
  - Transformer decoder 393–396
  - Transformer for machine translation 396–398
sequence-to-sequence model 370
Sequential class 62
Sequential model 186–189
sets 517
SGD (stochastic gradient descent) 51–54, 95
shape() function 76
shaping tensors 77–78
shortcut rule 493–495
shuffling, iterated K-fold validation with 145
sigmoid activation 234, 480
simple model 159
single words (unigrams) with binary encoding 347–350
slicing tensors 78–79
softmax activation 117, 480
softmax classification layer 29
softmax temperature 403
sparse_categorical_crossentropy loss function 121
spurious correlations 133–136
stacking recurrent layers 327–329
stakeholders 178–179
stemming 338
step fusing 472
steps_per_execution argument 472
stochastic gradient descent (SGD) 51–54, 95
stochastic sampling 403
StopIteration exception 525
strides, convolution 227
style loss 424
subclassing model class 196–199
- rewriting previous example as subclassed model 197–199
- what subclassed models don’t support 199
subroutines 506–507
SVM (Support Vector Machine) 14
symbolic AI 3
symbolic tensor 190

T

tanh activation 113
target leaking 173
target_vectorization layer 397
targets array 308
temperature value 403
temperature-forecasting example 302–316
- 1D convolutional model 314–315
- basic machine learning model 311–313
- first recurrent baseline 316
- non-machine learning baseline 310–311
- preparing data 306–309
Tensor objects 213
tensor operations 37
tensor product operation 41–43
tensor slicing 34
TensorBoard 208–209
TensorFlow
- xeample of neural networks 61–63
  - batch generator 63
  - Dense class 61–62
  - Sequential class 62
- gradient tape in 58–59
- Keras
  - history with TensorFlow 71
  - installing 73–74
  - overview 69–70
- Keras APIs 89–101
  - compile step 95–98
  - fit() method 99
  - inference 101
  - layers 89–93
  - models 94–95
  - monitoring loss and metrics on validation data 99–100
  - picking loss function 98
- overview 69
- Python and R interfaces 71–72
- setting up deep learning workspace 72–74
TensorFlow Serving 180
tensors 31–37
- attributes 75–89
  - broadcasting 79
  - constant tensors and variables 81–82
  - GradientTape API 83–84
  - linear classifier in TensorFlow example 84–89
  - operations 82–83
  - shape and reshaping 77–78
  - slicing 78–79
  - tf module 80–81
- data batches 35
- image data 36–37
- key attributes 33–34
- manipulating tensors in R 34
- matrices (rank 2 tensors) 32
- operations 37–48
  - broadcasting 40–41
  - element-wise operations 38–39
  - geometric interpretation of 44–47
  - geometric interpretation of deep learning 47–48
  - tensor product 41–43
  - tensor reshaping 43–44
- overview 74–75
- rank 3 and higher-rank tensors 32–33
- real-world examples of data tensors 35
- scalars (rank 0 tensors) 31
- time-series data or sequence data 36
- vector data 35–36
- vectors (rank 1 tensors) 31–32
- video data 37
TensorShape object 76
test sets 142–145
test_step() function 214
text data 336–344
- text splitting (tokenization) 338–339
- text standardization 337–338
- using layer_text_vectorization 340–344
- vocabulary indexing 339–340
text generation 401–413
- generating sequence data 402
- history of generative deep learning for sequence generation 401–402
- implementing text generation with Keras 404–408
  - preparing data 404–406
  - Transformer-based sequence-to-sequence model 406–408
- sampling strategy, importance of 402–404
- text-generation callback with variable-temperature sampling 408–413
text standardization 337–338
text_dataset_from_directory utility 346
text_vectorization layer 354
text-classification Transformer 379–381
TextVectorization layers 384
tf module 80–81
tf_function() 215–216, 293, 410
TF-IDF (term frequency, inverse document frequency) encoding 352–354
tf.string dtype tensors 342
tf.string tensors 347
tf.TensorShape object 76
tf$io module functions 267
tf$Variable class 81
tfataset object 307
tfdataset functions 310
tfdataset instance 236
tfdataset iteration loop 215
tfdataset iterator 237
tfdataset object 99, 213, 236, 342, 469
tfdataset pipeline 342
theta angle 153
time series
- different kinds of time-series tasks 301–302
- RNNs (recurrent neural networks) 317–333
  - recurrent layer in Keras 320–324
  - stacking recurrent layers 327–329
  - using bidirectional RNNs 329–332
  - using recurrent dropout to fight overfitting 324–327
- temperature-forecasting example 302–316
  - 1D convolutional model 314–315
  - basic machine learning model 311–313
  - first recurrent baseline 316
  - non-machine learning baseline 310–311
  - preparing data 306–309
- time-series data (sequence data) 36
tokenization (text splitting) 338–339
TPU (Tensor Processing Unit) 21, 471–472
TPUStrategy scope 471
train_step() method 216
training 142–145
- convnets 230–245
  - building model 234–235
  - data preprocessing 235–241
  - downloading data 231–233
  - relevance of deep learning for small data problems 230–231
  - using data augmentation 241–245
- data 141–142
- scaling-up model training 464–472
  - multi-GPU training 468–471
  - speeding up training on GPU with mixed precision 465–467
  - TPU training 471–472
training argument 210
training loops 10, 48
- inference versus 210–211
- using built-in 201–209
  - monitoring and visualization with TensorBoard 208–209
  - using callbacks 204–205
  - writing own callbacks 205–207
  - writing own metrics 202–203
- writing 210–218
  - complete training and evaluation loop 212–214
  - fit() with custom training loop 216–218
  - low-level usage of metrics 211–212
  - tf_function() 215–216
  - training versus inference 210–211
training step 63–65
Transformer
- architecture 366–382
  - multi-head attention 371–372
  - self-attention 366–371
  - when to use sequence models over bag-of-words models 381–382
- overview 482–483
- sequence-to-sequence model 406–408
Transformer decoder 373
Transformer encoder 372–381
- text-classification Transformer 379–381
- using positional encoding to reinject order information 378–379
TransformerDecoder 483
TransformerEncoder 483
translation invariant 481
tricks, GANs (generative adversarial networks) 444–445
tuples 514–516
Turing test 3

U

uint8 integers 465
underfitting 131–136
- ambiguous features 133
- noisy training data 132
- rare features and spurious correlations 133–136
unordered containers 517
unpack arguments 521
unpacking tuples 515–516
untar() utilities 261
update_state() method 202
update_weights function 64

V

VAEs (variational autoencoders) 431–442
- concept vectors for image editing 433
- image generation 434–436
- implementing with Keras 436–441
- sampling from latent spaces of images 432
- variational autoencoders 434–436
validation 142–145
- holdout validation 143–144
- iterated K-fold validation with shuffling 145
- K-fold validation 144–145
- monitoring loss and metrics on 99–100
validation metrics 175
validation_data argument 100, 110, 239
value-centric analogy 498–499
vanishing-gradient problem 321
Variable instance 58
variable_dtype property 467
variable-temperature sampling 408–413
variables 81–82
variational autoencoders 434–436
vector data 35–36
vectorization 174, 336, 476
vectors (rank 1 tensors) 31–32
VGG16 model 249
video data 37
VM (virtual machine) 468
vocabulary indexing 339–340

W

weight regularization 159–161
weight regularizer instances 159
weights function 247
weights property 62
whitespace 511
with statement 532
word embeddings 357–359
word-level tokenization 338
words 344–366
- bag-of-words approach 347–354
  - bigrams with binary encoding 350–351
  - bigrams with TF-IDF encoding 352–354
  - single words (unigrams) with binary encoding 347–350
- preparing IMDB movie reviews data 345–347
- sequence model approach 355–366
  - learning word embeddings with embedding layer 359–361
  - padding and masking 361–363
  - practical example 355–356
  - using pretrained word embeddings 363–366
  - word embeddings 357–359
WSL (Windows Subsystem for Linux) 72

X

xception_preprocess_input utility function 296

Y

yield statements 526–527

Z

zip_lists() helper function 64

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Numerics

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Table of Contents for
Index