A
- ablation studies 272
- abstraction
- abstract analogies 497
- two poles of 498, 501
- cognition as combination of both kinds of abstraction 500–501
- program-centric analogy 499–500
- value-centric analogy 498–499
- Abstraction and Reasoning Corpus (ARC) 495
- activations
- CAM (class activation map) visualization 283–289
- visualizing heatmaps of class 295–299
- adapt() method 342
- adversarial examples 486
- adversarial network 448–452
- AGI (artificial general intelligence) 508
- AI (artificial intelligence)
- deep learning and 2–3
- promise of 12–13
- setting course toward greater generality in 493–496
- new target 495–496
- shortcut rule 493–495
- various approaches to 475
- AI summer 476
- algorithmic modules 508
- algorithms 22
- all_dims() object 79
- ambiguous features 133
- Analytical Engine 3
- append() method 512
- ARC (Abstraction and Reasoning Corpus) 495
- architecture patterns 269–282
- batch normalization 275–278
- depthwise separable convolutions 278–280
- MHR (modularity, hierarchy, and reuse) formula 269
- mini Xception-like model 280–282
- residual connections 272–275
- architecture priors 149–150
- array objects 31
- array_reshape() function 29
- artificial general intelligence (AGI) 508
- artificial intelligence. See AI
- arXiv 509
- as.data.frame() method 112
- assign method 82
- automated machine learning 462
- automatic differentiation with computation graphs 55–58
- automatic shape inference 91–93
- automatons, intelligent agents vs. 488
B
- backpropagation algorithm 9, 54–59
- automatic differentiation with computation graphs 55–58
- chain rule 54–55
- gradient tape in TensorFlow 58–59
- bag-of-words approach 347–354
- bigrams with binary encoding 350–351
- bigrams with TF-IDF encoding 352–354
- single words (unigrams) with binary encoding 347–350
- when to use sequence models over 381–382
- bag-of-words models 338
- baseline, beating 176
- Basic Linear Algebra Subprograms (BLAS) 39
- batch generator 63
- batch_size argument 357
- BatchNormalization layers 276
- best practices
- getting most out of models 455–464
- hyperparameter optimization 455–462
- model ensembling 462–464
- scaling-up model training 464–472
- multi-GPU training 468–471
- speeding up training on GPU with mixed precision 465–467
- TPU training 471–472
- bias vector 277
- bidirectional layer 331
- bidirectional RNNs 329–332
- bigrams
- with binary encoding 350–351
- with TF-IDF encoding 352–354
- binary classification example 105–114
- building model 108–110
- IMDB dataset 105–106
- preparing data 107
- using trained model to generate predictions on new data 113
- validating approach 110–113
- binary encoding 350–351
- binary_crossentropy loss function 110
- BLAS (Basic Linear Algebra Subprograms) 39
- border effects 226–227
- Boston housing price dataset 122
- broadcasting 40–41, 79
- browser, deploying model in 181–182
- build() method 90, 187, 457
C
- call() method 61, 90, 190, 374
- Callback class 205
- callbacks 204–205
- early stopping callbacks 204–205
- text-generation callback with variable-temperature sampling 408–413
- writing 205–207
- CAM (class activation map) visualization 295–299
- categorical encoding 116
- causal padding 394
- CelebA dataset 445–447
- chain rule 54–55
- channels-first convention 37
- channels-last convention 37
- character-level tokenization 338
- class objects 522
- class statement 522
- classes 522–526
- iterators 525–526
- underscores 523–525
- classification
- binary classification example 105–114
- building model 108–110
- IMDB dataset 105–106
- preparing data 107
- using trained model to generate predictions on new data 113
- validating approach 110–113
- multiclass classification example 114–121
- building model 116–117
- generating predictions on new data 119–120
- handling labels and loss 120
- large intermediate layers, importance of 120–121
- preparing data 115–116
- Reuters dataset 114–115
- validating approach 117–119
- classname argument 375
- cognition 500–501
- cognitive automation 493
- combinatorial explosion 505
- common-sense baselines 145–146
- compilation step 29
- compile step, Keras APIs 95–98
- compile() method 95
- computation graphs, automatic differentiation with 55–58
- compute_dtype property 467
- compute_mask() method 361
- computer vision
- convnets (convolutional neural networks) 221–230
- convolution operation 223–227
- max-pooling operation 228–230
- essential computer vision tasks 259–260
- image segmentation example 260–269
- interpreting what convnets learn 283–299
- visualizing convnet filters 289–295
- visualizing heatmaps of class activation 295–299
- visualizing intermediate activations 283–289
- modern convnet architecture patterns 269–282
- batch normalization 275–278
- depthwise separable convolutions 278–280
- MHR (modularity, hierarchy, and reuse) formula 269
- mini Xception-like model 280–282
- residual connections 272–275
- pretrained models 245–257
- feature extraction with pretrained model 246–253
- fine-tuning pretrained model 254–257
- training convnet on small dataset 230–245
- building model 234–235
- data preprocessing 235–241
- downloading data 231–233
- relevance of deep learning for small data problems 230–231
- using data augmentation 241–245
- concept drift 171
- concept vectors 433
- constant tensors 81–82
- container types 512–517
- dictionaries 516–517
- lists 512–514
- sets 517
- tuples 514–516
- content loss 423–424
- context management 532
- conv_base model 249
- Conv1D layers 481
- Conv2D layers 222, 266, 481
- Conv2DTranspose layers 266
- Conv3D layers 481
- convnets (convolutional neural networks) 221–230
- architecture patterns 269–282
- batch normalization 275–278
- depthwise separable convolutions 278–280
- MHR (modularity, hierarchy, and reuse) formula 269
- mini Xception-like model 280–282
- residual connections 272–275
- convolution operation 223–227
- border effects and padding 226–227
- convolution strides 227
- interpretability 283–299
- visualizing convnet filters 289–295
- visualizing heatmaps of class activation 295–299
- visualizing intermediate activations 283–289
- max-pooling operation 228–230
- overview 481
- training on small dataset 230–245
- building model 234–235
- data preprocessing 235–241
- downloading data 231–233
- relevance of deep learning for small data problems 230–231
- using data augmentation 241–245
- convolution kernel 225
- convolution operation 223–227
- border effects and padding 226–227
- convolution strides 227
- convolutional base 246
- convolutional neural networks 221
- cost function 9
- cross-entropy 110
- CUDA 73–74
- cuDNN 73–74
- cuDNN kernel 279
D
- data
- collecting 169–173
- beware of nonrepresentative data 171–173
- investing in data annotation infrastructure 170–171
- convnets (convolutional neural networks)
- downloading 231–233
- preprocessing 235–241
- driving advances of deep learning 21–22
- exploring 173
- learning rules and representations from 4–7
- metric for success 173–174
- neural networks, data representations for 31–37
- data batches 35
- image data 36–37
- key attributes 33–34
- manipulating tensors in R 34
- matrices (rank 2 tensors) 32
- rank 3 and higher-rank tensors 32–33
- real-world examples of data tensors 35
- scalars (rank 0 tensors) 31
- time-series data or sequence data 36
- vector data 35–36
- vectors (rank 1 tensors) 31–32
- video data 37
- preparing 174–175
- handling missing values 175
- value normalization 174–175
- vectorization 174
- data augmentation 230, 241–245
- fast feature extraction without 249–251
- feature extraction together with 251–253
- data distillation 29
- data parallelism 468
- data vectorization 174
- dataset curation 152–153
- Dataset object 100
- dataset_map() method 238
- DCGAN (deep convolutional GAN) 443
- decision trees 15–16
- decoder network 440
- decorators 531–532
- deep learning 2–13, 72–74
- “deep” in “deep learning” 7–8
- achievements 10–11
- AI (artificial intelligence) 2–3, 12–13
- computer vision
- convnets (convolutional neural networks) 221–230
- essential computer vision tasks 259–260
- image segmentation example 260–269
- interpreting what convnets learn 283–299
- modern convnet architecture patterns 269–282
- pretrained models 245–257
- training convnet on small dataset 230–245
- for time series
- different kinds of time-series tasks 301–302
- RNNs (recurrent neural networks) 317–333
- temperature-forecasting example 302–316
- forces driving advances of 20–25
- algorithms 22
- data 21–22
- democratization of deep learning 23–24
- hardware 20–21
- lasting potential 24–25
- wave of investment 22–23
- future of 502–508
- blending together deep learning and program synthesis 504–506
- lifelong learning and modular subroutine reuse 506–507
- long-term vision 507–508
- machine learning vs. program synthesis 504
- models as programs 503
- generalization in 136–142
- interpolation as source of generalization 139
- manifold hypothesis 137–138
- training data 141–142
- why deep learning works 139–141
- geometric interpretation of operations 47–48
- how to think about 476–477
- installing Keras and TensorFlow 73–74
- learning rules and representations from data 4–7
- limitations of 485–493
- automatons vs. intelligent agents 488
- climbing spectrum of generalization 492–493
- local generalization vs. extreme generalization 489–491
- purpose of intelligence 491–492
- risk of anthropomorphizing machine learning models 486–488
- machine learning 3–4, 13–19, 475–476
- back to neural networks 16–17
- decision trees, random forests, and gradient-boosting machines 15–16
- deep learning different, reasons for making 17
- early neural networks 13–14
- kernel methods 14–15
- modern machine learning landscape 17–19
- probabilistic modeling 13
- NLP (natural language processing)
- overview 334–336
- preparing text data 336–344
- representing groups of words 344–366
- sequence-to-sequence learning 382–398
- Transformer architecture 366–382
- overview 8–10
- short-term expectations 11–12
- DeepDream 414–421
- def statement 520–521
- democratization of deep learning 23–24
- Dense class 61–62
- Dense layers 29, 222, 479
- densely connected layers 89
- densely connected networks 479–481
- deploying model 178–183
- explaining work to stakeholders and setting expectations 178–179
- maintaining 183
- monitoring 182–183
- shipping inference model 179–182
- deploying model as REST API 179–180
- deploying model in browser 181–182
- deploying model on device 180–181
- inference model optimization 182
- depthwise separable convolution layer 278
- depthwise separable convolutions 278–280
- derivatives
- chaining 54–59
- automatic differentiation with computation graphs 55–58
- chain rule 54–55
- gradient tape in TensorFlow 58–59
- overview 49–50
- developing model 174–178
- beating baseline 176
- choosing evaluation protocol 175–176
- overfitting 177
- preparing data 174–175
- handling missing values 175
- value normalization 174–175
- vectorization 174
- regularizing and tuning model 178
- device, deploying model on 180–181
- dictionaries 516–517
- dim() function 29, 75
- dimensionality 32
- discriminator network 444
- discriminators 447
- double array 29
- download.file() utilities 261
- dropout
- adding 161–164
- using recurrent dropout to fight overfitting 324–327
- dunder 523
F
- failure modes 179
- feature engineering 17, 153–154
- feature extraction 196
- feature extractor model 290
- feature maps 224
- features
- ambiguous features 133
- extraction with pretrained models 246–253
- fast feature extraction without data augmentation 249–251
- feature extraction together with data augmentation 251–253
- rare features and spurious correlations 133–136
- feed-forward networks 317
- fg function 55
- filters 225
- filters, visualizing convnet 289–295
- fine-tuning pretrained models 254–257
- fit() method 29, 95, 99, 198, 216–218, 236, 467
- five-dimensional vector 32
- Flatten layer 222
- Flatten operation 481
- float16 weights 465
- float32 data 174
- float32 inputs 465
- float32 tensor 296
- float32 value 466
- float32 weight variables 465
- float64 tensor 467
- floating-point numbers 530
- floating-point precision 465–467
- for loop 38, 199, 319, 458, 504
- for statement 518–520
- for() loop 410
- Fourier transform 302
- framing machine learning problem 168–169
- full training loop 65
- Functional API 189–196
- access to layer connectivity 194–196
- multi-input, multi-output models 191–192
- simple example 190–191
- training multi-input, multi-output model 192–193
G
- GANs (generative adversarial networks) 432, 442–452
- adversarial network 448–452
- CelebA dataset 445–447
- discriminators 447
- generators 447–448
- schematic GAN implementation 443–444
- tricks 444–445
- Gated Recurrent Unit (GRU) layers 328
- GCS (Google Cloud Storage) 180, 472
- generalization 130–142
- climbing spectrum of 492–493
- improving 152–164
- dataset curation 152–153
- feature engineering 153–154
- regularizing model 155–164
- using early stopping 154
- in deep learning 136–142
- interpolation as source of generalization 139
- manifold hypothesis 137–138
- training data 141–142
- why deep learning works 139–141
- local generalization vs.?extreme generalization 489–491
- underfitting and overfitting 131–136
- ambiguous features 133
- noisy training data 132
- rare features and spurious correlations 133–136
- generalized self-attention 370–371
- generative deep learning
- DeepDream 414–421
- GANs (generative adversarial networks) 442–452
- adversarial network 448–452
- CelebA dataset 445–447
- discriminators 447
- generators 447–448
- schematic GAN implementation 443–444
- tricks 444–445
- image generation with VAEs (variational autoencoders) 431–442
- concept vectors for image editing 433
- implementing VAE with Keras 436–441
- sampling from latent spaces of images 432
- variational autoencoders 434–436
- neural style transfer 422–431
- content loss 423–424
- neural style transfer in Keras 424–431
- style loss 424
- text generation 401–413
- generating sequence data 402
- history of generative deep learning for sequence generation 401–402
- implementing text generation with Keras 404–408
- sampling strategy, importance of 402–404
- text-generation callback with variable-temperature sampling 408–413
- generator network 444
- generator object 526
- generators
- defining with yield 526–527
- developing model 447–448
- geometric interpretation
- of deep learning operations 47–48
- of tensor operations 44–47
- geometric modules 508
- get_config() method 374–375
- gradient descent 49, 289
- gradient-based optimization 48–59
- backpropagation algorithm 54–59
- automatic differentiation with computation graphs 55–58
- chain rule 54–55
- gradient tape in TensorFlow 58–59
- derivatives 49–50
- gradients 50–51
- stochastic gradient descent 51–54
- tuning gradient descent parameters 147–149
- gradient-boosting machines 15–16
- gradients 50–51
- GradientTape API 83–84
- GradientTape object 63, 289
- GradientTape scope 291
- Gram matrix 424
- greedy sampling 402
- GRU (Gated Recurrent Unit) layers 328
H
- hardware 20–21
- hash() method 516
- heatmaps of class activation 295–299
- history object 99, 111
- holdout validation 143–144
- HSV (hue-saturation-value) format 5
- HyperModel class 457
- hyperparameter optimization 455–462
- automated machine learning 462
- crafting right search space 461
- using KerasTuner 456–461
- hypothesis space 6, 109, 269
I
- image classification 259
- image data 36–37
- image generation 431–442
- concept vectors for image editing 433
- implementing VAE with Keras 436–441
- sampling from latent spaces of images 432
- VAEs (variational autoencoders) 434–436
- image segmentation example 260–269
- image_dataset_from_directory() function 236
- imagenet_preprocess_input() function 249
- IMDB dataset 105–106, 345–347
- import statement 527–529
- include_top function 247
- increasing model capacity 150–152
- inference 101
- model optimization 182
- shipping inference model 179–182
- deploying model as REST API 179–180
- deploying model in browser 181–182
- deploying model on device 180–181
- inference model optimization 182
- training versus 210–211
- information arbitrage 333
- information distillation pipeline 288
- information-distillation process 7
- initialize() method 197
- input_shape function 247
- inputs object 190
- instance segmentation 260
- int32 tensors 347
- integer type 76
- integers 530
- intelligence 496–502
- as sensitivity to abstract analogies 497
- missing half of picture 501–502
- purpose of 491–492
- two poles of abstraction 498, 501
- cognition as combination of both kinds of abstraction 500–501
- program-centric analogy 499–500
- value-centric analogy 498–499
- intelligent agents 488
- intermediate layers 120–121
- interpolation 139
- interpretability 283–299
- visualizing convnet filters 289–295
- visualizing heatmaps of class activation 295–299
- visualizing intermediate activations 283–289
- investment 22–23
- iter method 525
- iterations
- general discussion of 527
- iterated K-fold validation with shuffling 145
- with for statement 518–520
K
- K-fold validation 124–128, 144–145
- Kaggle 509
- kaggle package 231
- Keras
- building Keras models 186–200
- Functional API 189–196
- mixing and matching different components 199–200
- Sequential model 187–189
- subclassing model class 196–199
- using right tool for job 200
- exploring ecosystem 509–510
- history with TensorFlow 71
- implementing DeepDream in 415–421
- implementing VAEs with 436–441
- installing 73–74
- neural style transfer in 424–431
- overview 69–70
- recurrent layer in 320–324
- text generation with 404–408
- preparing data 404–406
- Transformer-based sequence-to-sequence model 406–408
- using built-in training and evaluation loops 201–209
- monitoring and visualization with
- using callbacks 204–205
- writing own callbacks 205–207
- writing own metrics 202–203
- workflows 186
- writing training and evaluation loops 210–218
- complete training and evaluation loop 212–214
- fit() with custom training loop 216–218
- low-level usage of metrics 211–212
- tf_function() 215–216
- training versus inference 210–211
- Keras APIs 89–101
- compile step 95–98
- fit() method 99
- inference 101
- layers 89–93
- automatic shape inference 91–93
- composing layers with %>% (pipe operator) 93
- layer class 90–91
- models 94–95
- monitoring loss and metrics on validation data 99–100
- keras_model() constructor 191
- KerasTuner 456–461
- kernel function 15
- kernel methods 14–15
- kernel trick 14
- keyword arguments 521
L
- labels 120
- Large Hadron Collider (LHC) 16
- latent spaces 432
- Layer class 196
- layer_conv_1d layer 314
- layer_conv_2d layer 221
- layer_conv_3d layer 314
- layer_embedding layer 363
- layer_gru layers 325
- layer_lstm layers 325
- layer_max_pooling_2d layer 221
- layer_multi_head_attention layer 371
- layer_settings vector 421
- layer_simple_dense() layer 92
- layer_text_vectorization 340–344
- layer_text_vectorization layer 347
- layers, Keras APIs 89–93
- automatic shape inference 91–93
- composing layers with %>% (pipe operator) 93
- layer class 90–91
- learning_rate argument 96
- learning_rate factor 52
- length() function 75
- LHC (Large Hadron Collider) 16
- linear classifier in TensorFlow 84–89
- lists 512–514
- local generalization 489–491
- log() function 75
- logistic regression 13
- logs argument 206
- loss function
- multiclass classification handling 120
- picking 98
- LSTM (long short-term memory) 20, 316, 399
M
- machine learning 13–19
- back to neural networks 16–17
- decision trees 15–16
- deep learning and 3–4, 17, 475–476
- defining task 168–174
- collecting dataset 169–173
- exploring data 173
- framing problem 168–169
- metric for success 173–174
- deploying model 178–183
- explaining work to stakeholders and setting expectations 178–179
- maintaining 183
- monitoring 182–183
- shipping inference model 179–182
- developing model 174–178
- beating baseline 176
- choosing evaluation protocol 175–176
- overfitting 177
- preparing data 174–175
- regularizing and tuning model 178
- early neural networks 13–14
- evaluating machine learning models 142–146
- common-sense baselines 145–146
- model evaluation protocol 146
- training, validation, and test sets 142–145
- generalization 130–142
- improving 152–164
- in deep learning 136–142
- underfitting and overfitting 131–136
- gradient-boosting machines 15–16
- improving model fit 146–152
- increasing model capacity 150–152
- leveraging architecture priors 149–150
- tuning gradient descent parameters 147–149
- kernel methods 14–15
- modern machine learning landscape 17–19
- probabilistic modeling 13
- program synthesis vs. 504
- random forests 15–16
- risk of anthropomorphizing 486–488
- universal machine learning workflow 478–479
- MAE (mean absolute error) 124, 310
- maintaining model 183
- manifold hypothesis 137–138
- map_func argument 348
- masking 361–363
- matrices (rank 2 tensors) 32
- max tensor operation 228
- max-pooling operation 228–230
- MaxPooling2D layers 222, 266
- mean squared error (MSE) 123, 312, 481
- Metric instances 216
- metrics
- low-level usage of 211–212
- writing 202–203
- metrics active property 216
- metrics property 99
- MHR (modularity, hierarchy, and reuse) formula 269
- mini Xception-like model 280–282
- mini-batch SGD (stochastic gradient descent) 52
- MirroredStrategy object 469
- missing values 175
- mixed precision 465–467
- floating-point precision 465–467
- in practice 467
- model checkpoint callbacks 204–205
- model ensembling 462–464
- model parallelism 468
- model subclassing 186
- model$layers model property 195
- modules 527–529
- monitoring model 182–183
- MSE (mean squared error) 123, 312, 481
- mse loss function 113
- multi-GPU training 468–471
- single-host, multidevice synchronous training 468–471
- two or more GPUs 468
- multi-head attention 371–372
- multi-input, multi-output models
- overview 191–192
- training 192–193
- multiclass classification example 114–121
- building model 116–117
- generating predictions on new data 119–120
- handling labels and loss 120
- large intermediate layers, importance of 120–121
- preparing data 115–116
- Reuters dataset 114–115
- validating approach 117–119
- MultiHeadAttention layers 394
- multilabel categorical classification 480
N
- N-gram tokenization 338
- NaiveDense class 62
- NaiveSequential class 62
- natural language processing. See NLP
- network architectures 479–483
- convnets 481
- densely connected networks 479–481
- RNNs 482
- Transformers 482–483
- network size 155–158
- neural networks 7, 477
- advancements of 16–17
- binary classification example 105–114
- building model 108–110
- IMDB dataset 105–106
- preparing data 107
- using trained model to generate predictions on new data 113
- validating approach 110–113
- data representations for 31–37
- data batches 35
- image data 36–37
- key attributes 33–34
- manipulating tensors in R 34
- matrices (rank 2 tensors) 32
- rank 3 and higher-rank tensors 32–33
- real-world examples of data tensors 35
- scalars (rank 0 tensors) 31
- time-series data or sequence data 36
- vector data 35–36
- vectors (rank 1 tensors) 31–32
- video data 37
- early iterations of 13–14
- example of 27–31, 59–66
- evaluating model 66
- full training loop 65
- reimplementing from scratch in TensorFlow 61–63
- running one training step 63–65
- gradient-based optimization 48–59
- backpropagation algorithm 54–59
- derivatives 49–50
- gradients 50–51
- stochastic gradient descent 51–54
- multiclass classification example 114–121
- building model 116–117
- generating predictions on new data 119–120
- handling labels and loss 120
- large intermediate layers, importance of 120–121
- preparing data 115–116
- Reuters dataset 114–115
- validating approach 117–119
- regression example 121–128
- Boston housing price dataset 122
- building model 123–124
- generating predictions on new data 128
- preparing data 122–123
- validating approach using K-fold validation 124–128
- tensor operations 37–48
- broadcasting 40–41
- element-wise operations 38–39
- geometric interpretation of 44–48
- tensor product 41–43
- tensor reshaping 43–44
- neural style transfer 422–431
- content loss 423–424
- neural style transfer in Keras 424–431
- style loss 424
- new_layer_class() function 90
- next method 525
- ngrams = N argument 350
- NLP (natural language processing) overview 334–336
- preparing text data 336–344
- text splitting (tokenization) 338–339
- text standardization 337–338
- using layer_text_vectorization 340–344
- vocabulary indexing 339–340
- representing groups of words 344–366
- bag-of-words approach 347–354
- preparing IMDB movie reviews data 345–347
- sequence model approach 355–366
- sequence-to-sequence learning 382–398
- machine translation example 383–387
- with RNNs 387–392
- with Transformer 392–398
- Transformer architecture 366–382
- multi-head attention 371–372
- self-attention 366–371
- Transformer encoder 372–381
- when to use sequence models over bag-of-words models 381–382
- noisy training data 132
- normalization 174–175, 275–278
O
- object detection 260
- objective function 9
- Occam’s razor 159
- on_batch_* method 206
- on_epoch_* method 206
- one-hot encoding 116
- optimization 131
- Optimizer instance 64
- output feature map 224
- output_mode argument 353
- overfitting 31, 131–136
- ambiguous features 133
- developing model 177
- noisy training data 132
- rare features and spurious correlations 133–136
- using recurrent dropout to fight 324–327
P
- pack arguments 521
- packing tuples 515–516
- padding 226–227, 361–363
- patience value 460
- pipe operator (%>%) 93
- plot() function 194, 262
- plot() method 111
- positional encoding 378–379
- PositionalEmbedding layers 396, 483
- POSIXct format 304
- predict() loops 291
- predict() method 113, 198, 249
- predictions
- binary classification example 113
- multiclass classification example 119–120
- regression 128
- pretrained models 245–257
- feature extraction with 246–253
- fast feature extraction without data augmentation 249–251
- feature extraction together with data augmentation 251–253
- fine-tuning 254–257
- pretrained word embeddings 359, 363–366
- print() method 188, 320
- probabilistic modeling 13
- program synthesis 503
- blending together deep learning and 504–506
- integrating deep learning modules and algorithmic modules into hybrid systems 504–505
- using deep learning to guide program search 505–506
- machine learning vs. 504
- program-centric analogy 499–500
- program-space intuition 508
- programs, models as 503
- progressive disclosure of complexity 186
- [:punct:] class 385
- Python
- for R users 519–520
- decorators 531–532
- defining classes with class 522–526
- defining functions with def 520–521
- defining generators with yield 526–527
- import and modules 527–529
- integers and floats 530
- iteration closing remarks 527
- iteration with for 518–520
- R vectors 530–531
- whitespace 511
- with and context management 532
- R interfaces and 71–72
- query-key-value model 370–371
R
- manipulating tensors in 34
- Python for
- container types 512–517
- decorators 531–532
- defining classes with class 522–526
- defining functions with def 520–521
- defining generators with yield 526–527
- import and modules 527–529
- integers and floats 530
- interfaces from Python 71–72
- iteration closing remarks 527
- iteration with for 518–520
- R vectors 530–531
- whitespace 511
- with and context management 532
- random forests 15–16
- randomized A/B testing 182
- rank 3 tensors 32–33
- raster object 299
- reconstruction loss 436
- recurrent neural networks. See RNNs
- regression 121–128
- Boston housing price dataset 122
- building model 123–124
- generating predictions on new data 128
- preparing data 122–123
- validating approach using K-fold validation 124–128
- regularization loss 436
- regularizing model 155–164, 178
- adding dropout 161–164
- adding weight regularization 159–161
- reducing network size 155–158
- relu (rectified linear unit) 109
- relu activation 277
- relu operation 38
- reset_state() method 203
- reshaping tensors 77–78
- residual connections 272–275
- response map 225
- REST API 179–180
- result() method 203
- return_sequences argument 320
- Reuters dataset 114–115
- RGB (red-green-blue) format 5
- RMSE (root mean squared error) 202
- RMSprop optimizer 332
- rmsprop optimizer 60, 110
- RNNs (recurrent neural networks) 317–333
- overview 482
- recurrent layer in Keras 320–324
- stacking recurrent layers 327–329
- using bidirectional RNNs 329–332
- using recurrent dropout to fight overfitting 324–327
- ROC (receiver operating characteristic) 173
- rtuning model 178
S
- samples axis 35
- sampling bias 172
- sampling strategy 402–404
- scalars (rank 0
- tensors) 31
- scale() function 123, 175
- scaling-up model training 464–472
- multi-GPU training 468–471
- single-host, multidevice synchronous training 468–471
- two or more GPUs 468
- speeding up training on GPU with mixed
- precision 465–467
- floating-point precision 465–467
- in practice 467
- TPU training 471–472
- schematic GAN implementation 443–444
- second-order gradients 84
- segmentation mask 260
- self-attention 366–371
- semantic segmentation 260
- SeparableConv2D layers 290, 481
- sequence generation
- data for 402
- history of generative deep learning for 401–402
- sequence model approach 355–366
- learning word embeddings with the embedding layer 359–361
- padding and masking 361–363
- practical example 355–356
- using pretrained word embeddings 363–366
- when to use bag-of-words approach over 381–382
- word embeddings 357–359
- sequence-to-sequence learning 382–398, 482
- machine translation example 383–387
- with RNNs 387–392
- with Transformer 392–398
- Transformer decoder 393–396
- Transformer for machine translation 396–398
- sequence-to-sequence model 370
- Sequential class 62
- Sequential model 186–189
- sets 517
- SGD (stochastic gradient descent) 51–54, 95
- shape() function 76
- shaping tensors 77–78
- shortcut rule 493–495
- shuffling, iterated K-fold validation with 145
- sigmoid activation 234, 480
- simple model 159
- single words (unigrams) with binary encoding 347–350
- slicing tensors 78–79
- softmax activation 117, 480
- softmax classification layer 29
- softmax temperature 403
- sparse_categorical_crossentropy loss function 121
- spurious correlations 133–136
- stacking recurrent layers 327–329
- stakeholders 178–179
- stemming 338
- step fusing 472
- steps_per_execution argument 472
- stochastic gradient descent (SGD) 51–54, 95
- stochastic sampling 403
- StopIteration exception 525
- strides, convolution 227
- style loss 424
- subclassing model class 196–199
- rewriting previous example as subclassed model 197–199
- what subclassed models don’t support 199
- subroutines 506–507
- SVM (Support Vector Machine) 14
- symbolic AI 3
- symbolic tensor 190
T
- tanh activation 113
- target leaking 173
- target_vectorization layer 397
- targets array 308
- temperature value 403
- temperature-forecasting example 302–316
- 1D convolutional model 314–315
- basic machine learning model 311–313
- first recurrent baseline 316
- non-machine learning baseline 310–311
- preparing data 306–309
- Tensor objects 213
- tensor operations 37
- tensor product operation 41–43
- tensor slicing 34
- TensorBoard 208–209
- TensorFlow
- xeample of neural networks 61–63
- batch generator 63
- Dense class 61–62
- Sequential class 62
- gradient tape in 58–59
- Keras
- history with TensorFlow 71
- installing 73–74
- overview 69–70
- Keras APIs 89–101
- compile step 95–98
- fit() method 99
- inference 101
- layers 89–93
- models 94–95
- monitoring loss and metrics on validation data 99–100
- picking loss function 98
- overview 69
- Python and R interfaces 71–72
- setting up deep learning workspace 72–74
- TensorFlow Serving 180
- tensors 31–37
- attributes 75–89
- broadcasting 79
- constant tensors and variables 81–82
- GradientTape API 83–84
- linear classifier in TensorFlow example 84–89
- operations 82–83
- shape and reshaping 77–78
- slicing 78–79
- tf module 80–81
- data batches 35
- image data 36–37
- key attributes 33–34
- manipulating tensors in R 34
- matrices (rank 2 tensors) 32
- operations 37–48
- broadcasting 40–41
- element-wise operations 38–39
- geometric interpretation of 44–47
- geometric interpretation of deep learning 47–48
- tensor product 41–43
- tensor reshaping 43–44
- overview 74–75
- rank 3 and higher-rank tensors 32–33
- real-world examples of data tensors 35
- scalars (rank 0 tensors) 31
- time-series data or sequence data 36
- vector data 35–36
- vectors (rank 1 tensors) 31–32
- video data 37
- TensorShape object 76
- test sets 142–145
- test_step() function 214
- text data 336–344
- text splitting (tokenization) 338–339
- text standardization 337–338
- using layer_text_vectorization 340–344
- vocabulary indexing 339–340
- text generation 401–413
- generating sequence data 402
- history of generative deep learning for sequence generation 401–402
- implementing text generation with Keras 404–408
- preparing data 404–406
- Transformer-based sequence-to-sequence model 406–408
- sampling strategy, importance of 402–404
- text-generation callback with variable-temperature sampling 408–413
- text standardization 337–338
- text_dataset_from_directory utility 346
- text_vectorization layer 354
- text-classification Transformer 379–381
- TextVectorization layers 384
- tf module 80–81
- tf_function() 215–216, 293, 410
- TF-IDF (term frequency, inverse document frequency) encoding 352–354
- tf.string dtype tensors 342
- tf.string tensors 347
- tf.TensorShape object 76
- tf$io module functions 267
- tf$Variable class 81
- tfataset object 307
- tfdataset functions 310
- tfdataset instance 236
- tfdataset iteration loop 215
- tfdataset iterator 237
- tfdataset object 99, 213, 236, 342, 469
- tfdataset pipeline 342
- theta angle 153
- time series
- different kinds of time-series tasks 301–302
- RNNs (recurrent neural networks) 317–333
- recurrent layer in Keras 320–324
- stacking recurrent layers 327–329
- using bidirectional RNNs 329–332
- using recurrent dropout to fight overfitting 324–327
- temperature-forecasting example 302–316
- 1D convolutional model 314–315
- basic machine learning model 311–313
- first recurrent baseline 316
- non-machine learning baseline 310–311
- preparing data 306–309
- time-series data (sequence data) 36
- tokenization (text splitting) 338–339
- TPU (Tensor Processing Unit) 21, 471–472
- TPUStrategy scope 471
- train_step() method 216
- training 142–145
- convnets 230–245
- building model 234–235
- data preprocessing 235–241
- downloading data 231–233
- relevance of deep learning for small data problems 230–231
- using data augmentation 241–245
- data 141–142
- scaling-up model training 464–472
- multi-GPU training 468–471
- speeding up training on GPU with mixed precision 465–467
- TPU training 471–472
- training argument 210
- training loops 10, 48
- inference versus 210–211
- using built-in 201–209
- monitoring and visualization with TensorBoard 208–209
- using callbacks 204–205
- writing own callbacks 205–207
- writing own metrics 202–203
- writing 210–218
- complete training and evaluation loop 212–214
- fit() with custom training loop 216–218
- low-level usage of metrics 211–212
- tf_function() 215–216
- training versus inference 210–211
- training step 63–65
- Transformer
- architecture 366–382
- multi-head attention 371–372
- self-attention 366–371
- when to use sequence models over bag-of-words models 381–382
- overview 482–483
- sequence-to-sequence model 406–408
- Transformer decoder 373
- Transformer encoder 372–381
- text-classification Transformer 379–381
- using positional encoding to reinject order information 378–379
- TransformerDecoder 483
- TransformerEncoder 483
- translation invariant 481
- tricks, GANs (generative adversarial networks) 444–445
- tuples 514–516
- Turing test 3
U
- uint8 integers 465
- underfitting 131–136
- ambiguous features 133
- noisy training data 132
- rare features and spurious correlations 133–136
- unordered containers 517
- unpack arguments 521
- unpacking tuples 515–516
- untar() utilities 261
- update_state() method 202
- update_weights function 64
V
- VAEs (variational autoencoders) 431–442
- concept vectors for image editing 433
- image generation 434–436
- implementing with Keras 436–441
- sampling from latent spaces of images 432
- variational autoencoders 434–436
- validation 142–145
- holdout validation 143–144
- iterated K-fold validation with shuffling 145
- K-fold validation 144–145
- monitoring loss and metrics on 99–100
- validation metrics 175
- validation_data argument 100, 110, 239
- value-centric analogy 498–499
- vanishing-gradient problem 321
- Variable instance 58
- variable_dtype property 467
- variable-temperature sampling 408–413
- variables 81–82
- variational autoencoders 434–436
- vector data 35–36
- vectorization 174, 336, 476
- vectors (rank 1 tensors) 31–32
- VGG16 model 249
- video data 37
- VM (virtual machine) 468
- vocabulary indexing 339–340
W
- weight regularization 159–161
- weight regularizer instances 159
- weights function 247
- weights property 62
- whitespace 511
- with statement 532
- word embeddings 357–359
- word-level tokenization 338
- words 344–366
- bag-of-words approach 347–354
- bigrams with binary encoding 350–351
- bigrams with TF-IDF encoding 352–354
- single words (unigrams) with binary encoding 347–350
- preparing IMDB movie reviews data 345–347
- sequence model approach 355–366
- learning word embeddings with embedding layer 359–361
- padding and masking 361–363
- practical example 355–356
- using pretrained word embeddings 363–366
- word embeddings 357–359
- WSL (Windows Subsystem for Linux) 72
X
- xception_preprocess_input utility function 296
Z
- zip_lists() helper function 64
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.