act()
method, defining DQN agent, 299–300, 301
Action potential, of biological neurons, 85–86
Action(s)
deep Q-learning network theory, 290–292
DeepMind DQN and, 59
Markov decision processes and, 286
reinforcement learning problems and, 55–56
Activation functions
calculus behind backpropagation, 335–336
choosing neuron type, 96
convolutional example, 164–166
nonlinear nature of in deep learning architectures, 95
softmax layer of fast food-classifying network, 106–108
tanh neuron, 94
Activation maps
convolutional networks and, 238
in discriminator network, 267–268
in generator network, 269, 272
LeNet-5 ConvNet architecture, 173–175
as output from convolutional kernels, 163–167
pooling layers spatially reducing, 169–170
Actor-critic algorithm, RL agent, 307–308
AdaGrad optimizer, 146
Adaptive moment estimation (Adam) optimizer, 147, 148–149
Adversarial network, GANs, 272–274
Agent(s)
deep Q-learning network theory, 290–292
deep reinforcement learning and, 57
DeepMind DQN, 58
DQN. See DQN agents
optimal policy in deep reinforcement learning, 289–290
reinforcement learning problems of machine learning, 54–56
reinforcement learning theory, 283
SLM Lab, 304
AGI (artificial general intelligence), 72, 326–328
AI. See Artificial intelligence (AI)
AlexNet
CNN model inspired by, 176–177
history of deep learning for NLP, 25
ReLU neurons in, 95
Algorithms, development of AGI and, 327
AlphaGo Master, 64
Amazon review polarity, NLP training/validation samples, 316
ANI (artificial narrow intelligence), 72, 326–327
Architecture
AlexNet hierarchical, 16
bidirectional LSTM sentiment classifier, 248
convolutional sentiment classifier, 237–238
deep learning model, 52
deep net in Keras model, 148–149
dense sentiment classifier, 229–231
generalist neural network as single network, 52
intermediate-depth neural network, 127–128
LeNet-5 hierarchical, 9–11, 172–176
LSTM, 247
multi-ConvNet sentiment classifier, 253–254
regression model network, 150–151
residual network, 182
RNN sentiment classifier, 243–244
shallow neural network, 78–79, 83
stacked recurrent model, 249
TensorFlow Playground, 17
weight initialization, 133–135
Arithmetic
Art. See Machine art
Artificial general intelligence (AGI), 72, 326–328
Artificial intelligence (AI)
deep learning for NLP relevant to, 53
driven by deep learning, 52
general-purpose learning algorithms for, 58
history of chess and, 65
machine learning as subset of, 50
OpenAI Gym environments as, 68–70
Artificial narrow intelligence (ANI), 72, 326–327
Artificial neural networks (ANNs). See also Artificial neurons, constituting ANNs
AlphaGo Zero development, 63
architecture for shallow networks, 83–84
building model for DQN agent, 297–298
deep reinforcement learning using, 56–57
dominating representation learning, 51
hot dog-detecting dense network, 101–106
input layer, 99
key concepts, 110
manipulation of objects via, 67–68
schematic diagram of Jupyter network, 77–79
softmax layer of fast food-classifying network, 106–108
summary, 110
Artificial neurons
deep learning and, 22
deep learning model architectures, 51–52
Artificial neurons, constituting ANNs
biological neuroanatomy, 85–86
choosing neuron type, 96
hot dog/not hot dog detector, 86–90
key concepts, 97
modern neurons/activation functions, 91–95
most important equation in this book, 90–91
overview of, 85
perceptrons as early, 86
summary, 96
tanh neuron, 94
Artificial super intelligence (ASI), 72, 327
astype()
method, LeNet-5 in Keras, 172
Attention, seq2seq and, 250
Backpropagation
of bidirectional LSTMs, 247
cross-entropy costs and, 114
enabling neural networks to learn, 113
minimizing cost, 115
partial-derivative calculus behind, 335–337
training recurrent neural networks, 241
tuning hidden-layer and neuron counts, 125–126
BAIR (Berkeley Artificial Intelligence Research) Lab, 44–45
Batch normalization
deep neural networks in Keras, 148
improving deep networks, 138–139
network architecture regression model, 150–151
Batch size
of 1, also known as online learning, 124
building own project and, 320
escaping local minimum of cost, 122–124
as hyperparameter, 119
and stochastic gradient descent, 119–122
Behavioral cloning, 307
Benchmarking performance, SLM lab, 304
Bengio, Yoshua
Turing Award for deep learning, 15
weight initialization and Glorot normal distribution, 135–137
Berkeley Artificial Intelligence Research (BAIR) Lab, 44–45
BERT (bi-directional encoder representations from transformers), NLP, 251
beta (β) hyperparameter
batch normalization adding, 139
Bi-directional encoder representations from transformers (BERT), NLP, 251
bias (b)
adding to convolutional layers, 162
in convolutional example, 164
minimizing cost via gradient descent, 115–116
notation for neural networks, 333
Bidirectional LSTMs (Bi-LSTMs), 247–249
Binary-only restriction, of perceptrons, 91–92
Biological neurons
creating perceptron algorithm with, 86
ReLU neuron activation function, 94–95
Board games
overview of, 59
boston_housing
dataset, 149–150
Bostrom, Nick, 72
Bounding boxes, developing YOLO, 185–186
build_discriminator
function, 266–268
Caffe, deep learning library, 324
Calculus, in backpropagation, 335–337
callbacks
argument
dense sentiment classifier, 232
Cambrian explosion, 3
Capsule networks, machine vision and, 192
Cart-Pole game
defining DQN agent for. See DQN agents
DQN agent interaction with OpenAI Gym, 300–303
estimating optimal Q-value, 292
hyperparameter optimization using SLM Lab, 304
Markov decision processes in, 288–289
as reinforcement learning problem, 284–286
CartPole, OpenAI Gym environment, 70
CBOW (continuous bag of words), word2vec, 207, 208
Cell body, biological neurons, 85–86
Cerebral cortex, processing visual information, 3–4
cGAN (conditional GAN), 45
Chain rule of calculus, backpropagation and, 124
Chatbots, natural language processing in, 23–24
Checkpoints, dense sentiment classifier, 231
Chen, Chen, deep learning image processing, 47–48
Chess
vs. Go board complexity, 59, 61
Classification
adding layers to transfer learning model, 190
convolutional sentiment classifier, 235–239
of film reviews by sentiment, 229–235
natural language. See Natural language classification
as supervised learning problem, 53–54
CNNs. See Convolutional neural networks (CNNs)
CNTK, deep learning library, 324
Coding shallow network in Keras
designing neural network architecture, 83
installation, 76
MNIST handwritten digits, 76–77
schematic diagram of network, 77–79
software dependencies for shallow net, 80
summary, 84
training deep learning model, 83–84
Color, visual cortex detects, 7–8
Compiling
adversarial network, 274
dense sentiment classifier, 231
discriminator network, 269
network model for DQN agent, 298
Complex neurons
forming primary visual cortex, 6–7
neocognition and, 9
Computational complexity
minimizing number of kernels to avoid, 163
from piping images into dense networks, 160
Computational homogeneity, with Software 2.0, 325
Computing power, AGI and development of, 327
Conditional imitation learning algorithms, 307
Content generation, building socially beneficial projects, 318
Context words, running word2vec, 207–209
Continuous bag of words (CBOW), word2vec, 207, 208
Continuous variable, supervised learning problem, 54
Contracting path, U-Net, 187–188
Conv2D
dependency, LeNet-5 in Keras, 171–174
Convolutional filter hyperparameters, CNNs, 168–169
Convolutional layers
convolutional neural networks (CNNs) and, 160–162
general approach to CCN design, 176
working with pooling layers, 169–170
Convolutional layers, GANs
birth of GANs, 41
convolutional neural networks (CNNs) and, 52–53
results of latent space arithmetic, 42–44
Convolutional neural networks (CNNs)
computational complexity, 160
contemporary machine vision and, 52–53
convolutional filter hyperparameters, 168–169
DeepMind DQN using, 58
detecting spatial patterns among words, 235–239
developing Faster R-CNN, 184–185
general approach to CCN design, 176
image segmentation with Mask R-CNN, 187
manipulation of objects via, 67–68
model inspired by AlexNet, 176–178
model inspired by VGGNet, 178–179
object detection with Fast R-CNN, 184
object detection with R-CNN, 183–184
overview of, 159
transfer learning model of, 188–192
two-dimensional structure of visual imagery, 159–160
Convolutional sentiment classifier, 235–239, 252–256
convTranspose
layers, in generator networks, 270
Corpus
one-hot encoding of words within, 25–26
word2vec architectures for, 208
Cortes, Corinna, curating MNIST dataset, 77–78
Cost (loss) functions
building own project, 319
in stochastic gradient descent, 120
training deep networks and, 111
using backpropagation to calculate gradient of, 124–125, 335–337
Cost, minimizing via optimization
batch size and stochastic gradient descent, 119–122
escaping local minimum, 122–124
training deep networks and, 115
Count based, word2vec as, 208
Cross-entropy cost
essential GAN theory, 262
minimizes impact of neuron saturation, 113–115, 131
Data
augmentation, training deep networks, 145
development of AGI and, 327
Data generators, training, 190–191
DataFrame, IMDb validation data, 234
Datasets, deep reinforcement learning using larger, 57
De-convolutional layers, generator networks, 269–270, 272
deCNN, generator network as, 270
Deep Blue, history of chess, 65
Deep learning
code. See Coding shallow network in Keras
computational representations of language. See Language, computational representations of
definition of, 22
elements of natural human language in, 33–35
Google Duplex as NLP based on, 35–37
natural language processing and, 23–25, 37
networks learn representations automatically, 22–23
reinforcement learning combined with. See Reinforcement learning, deep
training deep networks. See Training deep networks
Deep learning, introduction
machine vision. See Machine vision
Quick, Draw! game, 19
summary, 20
traditional machine learning vs., 11–12
Deep learning projects, building own
artificial general intelligence approach, 326–328
converting existing machine learning project, 316–317
deep learning libraries, 321–324
deep reinforcement learning, 316
machine vision and GANs, 313–315
modeling process, including hyperparameter tuning, 318–321
natural language processing, 315–316
overview of, 313
resources for further projects, 317–318
Deep networks, improving
deep neural network in Keras, 147–149
model generalization (avoiding overfitting), 140–145
overview of, 131
summary, 154
weight initialization, 131–135
Xavier Glorot distributions, 135–137
Deep Q-learning networks (DQNs)
DeepMind video game and, 58–60
defining DQN agent. See DQN agents
Deep reinforcement learning. See Reinforcement learning, deep
DeepMind
AlphaGo Zero board game, 62–65
Google acquiring, 59
DeepMind Lab
building own deep learning project with, 316
deep reinforcement learning, 69, 71
Dendrites, and biological neurons, 85–86
Denormalization, in batch normalization, 139
Dense layers
architecting intermediate net in Keras, 127–128
artificial neural networks with, 99–100
CNN model inspired by AlexNet, 177–178
computational complexity and, 160
convolutional layers vs., 168
deep learning and, 51
Fast R-CNN and, 184
general approach to CCN design, 176
LeNet-5 in Keras and, 172–173, 175–176
multi-ConvNet model architecture, 253–255
in natural language processing, 224–225, 230–231, 236–238
networks designed for sequential data, 243
in shallow networks, 109
using weight initialization for deep networks, 132–133, 137
in wide and need model architecture, 317
Dense network
building socially beneficial projects, 318
defined, 100
revisiting shallow network, 108–110
softmax layer of fast food-classifying network, 106–108
Dense sentiment classifier, 229–235
Dense Sentiment Classifier Jupyter notebook. See Natural language classification
Dependencies
Cart-Pole DQN agent, 293
convolutional sentiment classifier, 236
LeNet-5 in Keras, 171
loading GAN for Quick, Draw! game, 264–265
loading IMDb film reviews, 222–223
preprocessing natural language, 197
regression model, 150
TensorFlow with Keras layers, 323
Dimensionality reduction, plotting word vectors, 213–217
Discount factor (decay), Markov decision processes, 288–289
Discounted future reward
expected, 290
maximizing, 290
Discriminator network, GANs
code for training, 277
Distributed representations, localist representations vs., 32
Dot product notation, perceptron equation, 90–91
DQN agents
building neural network model for, 297–298
drawbacks of, 306
hyperparameter optimization using SLM Lab, 304
initialization parameters, 295–297
interacting with OpenAI Gym environment, 300–303
remembering gameplay, 298
selecting action to take, 299–300
training via memory replay, 298–299
DQNs. See Deep Q-learning networks (DQNs)
Dropout
for AlexNet in Keras, 177
for deep neural network in Keras, 148
Eager mode, TensorFlow, 322–323
Ease of use, Software 2.0, 326
Efros, Alexei, 44
ELMo (embeddings from language models), transfer learning, 251
Elo scores
AlphaZero game, 66
Encoder-decoder structure, NMT, 250
Environment(s)
DeepMind DQN, 58
popular deep reinforcement learning, 68
reinforcement learning problems of machine learning, 54–56
reinforcement learning theory, 283
training agents simultaneously via SLM Lab in multiple, 304
Epochs of training, checkpointing model parameters after, 231–232
Essential theory. See Theory, essential
exp
function, softmax layer of fast food-classifying network, 106–108
Expanding path, U-Net, 187–188
Experiment graph, SLM Lab, 304
Expertise, subject-matter
AutoNet reducing requirement for, 17
deep learning easing requirement for, 22–23
Exploding gradients, ANNs, 138
Extrinsic evaluations, evaluating word vectors, 209
Face detection
arithmetic on fake human faces, 41–44
birth of generative adversarial networks, 39–41
engineered features for robust real-time, 12–13
in visual cortex, 8
Facebook, fastText library, 33
False negative, IMDb reviews, 236
False positive, IMDb reviews, 235
Fan Hui, AlphaGo match, 62
Fancy optimizers, deep network improvement, 145–147
Fashion-MNIST dataset, deep learning project, 313–315
Fast food-classifying network, softmax layer of, 106–108
Fast R-CNN, object detection, 184
Faster R-CNN, object detection, 184–185
Feature engineering
AlexNet automatic vs. expertise-driven, 17
defined, 11
traditional machine learning and, 12–13
traditional machine learning vs. deep learning, 11–12
Feature maps
convolutional example of, 163–167
image segmentation with U-Net, 188
transfer learning model and, 188–192
Feedforward neural networks, training, 241
FetchPickAndPlace, OpenAI Gym, 70
Figure Eight
image-classification model, 315
natural language processing model, 316
Filters. See Kernels (filters)
Finn, Chelsea, 67
fit_generator()
method, transfer learning, 191–192
Fitting, dense sentiment classifier, 232
Flatten
layer, LeNet-5 in Keras, 171–174
FloatTensor
, PyTorch, 339
for
loop, GAN training, 275–281
Formal notation, neural networks, 333–334
Forward propagation
backpropagation vs., 124
defined, 103
in hot dog-detecting dense network, 101–106
notation for neural networks, 334
in softmax layer of fast food-classifying network, 106–108
in stochastic gradient descent, 120, 121
Frozen Lake game, 316
Fukushima, Kunihiko, LeNet-5, 9–12
Fully connected layer (as dense layer), 99
Functional API, non-sequential architectures and Keras, 251–256
Fusiform face area, detecting in visual cortex, 8
Game-playing machines
artificial intelligence, 49–50
artificial neural networks (ANNs), 51
categories of machine learning problems, 53–56
deep reinforcement learning, 56–57
machine learning, 50
manipulation of objects, 67–68
natural language processing, 53
popular deep reinforcement learning environments, 68–71
representation learning, 51
Software 2.0 and, 326
summary, 72
gamma (γ), batch normalization adding, 139
GANs. See Generative adversarial networks (GANs)
Gated recurrent units (GRUs), 249–250
gberg_sents
, tokenizing natural language, 199
Generative adversarial networks (GANs)
actor-critic algorithm reminiscent of, 308
adversarial network component, 272–274
arithmetic on fake human faces, 41–44
building and tuning own, 315
creating photorealistic images from text, 45–46
discriminator network component, 266–269
generator network component, 269–272
high-level concepts behind, 39
image processing using deep learning, 47–48
making photos photorealistic, 45
Quick, Draw! game dataset, 263–266
reducing computational complexity with, 170
Software 2.0 and, 326
summary, 281
Generator network, GANs
GitHub repository, Quick, Draw! game dataset, 263
Global minimum of cost, training deep networks for, 122–124
Glorot normal distribution, improving deep networks, 135–137
GloVe
converting natural words to word vectors, 28
as major alternative to word2vec, 208
Goodfellow, Ian
arithmetic on fake human faces and, 41–44
Google Duplex technology, deep-learning-based NLP, 35–37
GPUs (graphics processing units), deep reinforcement learning, 57
Gradient descent
batch size and stochastic, 119–122
cross-entropy costs and, 114
enabling neural networks to learn, 113
escaping local minimum using, 122–124
training deep networks with batch size/stochastic, 119–122
Graesser, Laura, 304
Graphics processing units (GPUs), deep reinforcement learning, 57
GRUs (gated recurrent units), 249–250
Gutenberg, Johannes, 197
HandManipulateBlock, OpenAI Gym, 70
Handwritten digits, MNIST, 76–78
Hidden layers
artificial neural network with, 99
building network model for DQN agent, 297
calculus behind backpropagation, 337
deep learning model architectures, 51–52
dense layers within. See Dense layers
forward propagation in dense network through, 102–106
hot dog-detecting dense network, 101–106
neural network notation, 333–334
schematic diagram of shallow network, 79
TensorFlow Playground demo, 100
tuning neuron count and number of, 125–126
Hidden state, LSTM, 245
Hierarchical softmax, training word2vec, 208
Hinton, Geoffrey
developing capsule networks, 192
developing t-distributed stochastic neighbor embedding, 213–214
as godfather of deep learning, 14–15
Histogram of validation data
convolutional sentiment classifier, 239
dense sentiment classifier, 233–234
Hochreiter, Sepp, 244
Hot dog-detecting dense network, 101–106
Hot dog/not hot dog detector, perceptrons, 86–90
Hubel, David
LeNet-5 model built on work of, 10–12
machine vision approach using work of, 8–9
research on visual cortex, 4–7
Human and machine language. See also Language, computational representations of
deep learning for natural language processing, 21–25
elements of natural human language in, 33–35
Google Duplex technology, 35–37
summary, 37
Humanoid, OpenAI Gym environment, 70
Hyperparameters. See also Parameters
in artificial neural networks, 130
automating search for, 321
batch size, 119
convolutional filter, 163, 167
convolutional sentiment classifier, 236–237
learning rate, 118
for loading IMDb film reviews, 223–225
multi-ConvNet sentiment classifier, 253
number of epochs of training, 122
optimizing with SLM Lab, 303–306
reducing model overfitting with dropout, 144–145
RMSProp and AdaDelta, 147
IMDb (Internet Movie Database) film reviews. See Natural language classification
ILSVRC (ImageNet Large Scale Visual Recognition Challenge)
ResNet, 182
traditional ML vs. deep learning entrants, 13–14
Image classification
building socially beneficial projects using, 318
ILSVRC competition for, 182
machine vision datasets for deep learning, 313–315
object detection vs., 183
Image segmentation applications, machine vision, 186–188
ImageDataGenerator
class, transfer learning, 190–191
Images
creating photorealistic. See Machine art
processing using deep learning, 46–48
Imitation learning, agents beyond DQN optimizing, 307
Infrastructure, rapid advances in, 327
Initialization parameters, DQN agent, 295–297
Input layer
artificial neural networks with, 99
of deep learning model architectures, 51–52
hot dog-detecting dense network, 101–106
LSTM, 245
notation for neural networks, 333
schematic diagram of shallow network, 79
TensorFlow Playground demo, 100
Installation
of code notebooks, 76
PyTorch, 341
Integer labels, converting to one-hot, 82–83
Intermediate Net in Keras Jupyter notebook, 127–129
Internal covariate shift, batch normalization, 138–139
Internet Movie Database (IMDb) film reviews. See Natural language classification
Intrinsic evaluations, word vectors, 209
iter
argument, running word2vec, 210
Kaggle
image-classification model, 315
natural language processing model, 316
Kasparov, Garry, 65
Keng, Wah Loon, 304
Keras
AlexNet and VGGNet in, 176–179
coding in. See Coding shallow network in Keras
deep learning library in, 321–323
deep neural network in, 147–149
functional API, non-sequential architectures and, 251–256
implementing RNN, 242
intermediate-depth neural network in, 127–129
loading IMDb film reviews in, 225–226
parameter-adjustment in, 144
TensorBoard dashboard in, 152–154
weight initialization in, 132–135
Kernels (filters)
convolutional example of, 164–167
of convolutional layers, 160–162
number in convolutional layer, 162–163
size, convolutional filter hyperparameter, 167
Key concepts
artificial neural networks (ANNs), 110
artificial neurons that constitute ANNs, 97
deep reinforcement learning, 308–309
generative adversarial networks (GANs), 281–282
improving deep networks, 154–155
machine vision, 193
natural language processing (NLP), 256–257
training deep networks, 130
L1 vs. L2 regularization, reducing model overfitting, 141–142
Language. See Human and machine language
Language, computational representations of
localist vs. distributed representations, 32–33
one-hot representations of words, 25–26
overview of, 25
word2viz tool for exploring, 30–32
LASSO regression, reducing model overfitting, 141–142
Latent space
arithmetic on fake human faces in, 42–44
birth of generative adversarial networks, 40–41
Layers
deep learning model architectures, 51–52
Leaky ReLU activation function, 96
Learn Python the Hard Way (Shaw), 75
Learning rate
batch normalization allowing for higher, 139
building own project, 320
shortcomings of improving SGD with momentum, 146
as step size in gradient descent, 117–119
LeCun, Yan
on fabricating realistic images, 39
MNIST handwritten digits curated by, 76–78
Turing Award for deep learning, 15
Legg, Shane, 58
Lemmatization, as sophisticated alternative to stemming, 196
LeNet-5 model
Les 3 Brasseurs bar, 39
Levine, Sergey, 67
Libraries, deep learning, 321–324
Linear regression, object detection with R-CNN, 183–184
List comprehension
adding word stemming to, 201
removing stop words and punctuation, 200–201
load()
method, neural network model for DQN agent, 300
Loading data
coding shallow network in Keras, 79–81
load_weights()
method, loading model parameters, 232
Local minimum of cost, escaping, 122–124
Localist representations, distributed representations vs., 32–33
Long short-term memory (LSTM) cells
bidirectional (Bi-LSTMs), 247–248
implementing with Keras, 246–247
as layer of NLP, 53
Long-term memory, LSTM, 245–246
Lowercase
converting all characters in NLP to, 195–196, 199–200
processing full corpus, 204–206
LSTM. See Long short-term memory (LSTM) cells
LunarLander, OpenAI Gym environment, 70
Maaten, Laurens van der, 213–214
Machine art
arithmetic on fake human faces, 41–44
creating photorealistic images from text, 45–46
image processing using deep learning, 46–48
make your own sketches photorealistic, 45
overview of, 39
summary, 48
Machine language. See Human and machine language
Machine learning (ML). See also Traditional machine learning (ML) approach
overview of, 50
reinforcement learning problems of, 54–56
representation learning as branch of, 51
supervised learning problems of, 53–54
traditional machine vs. representation learning techniques, 22
unsupervised learning problems of, 54
Machine translation, NLP in, 23–24
Machine vision
AlexNet and VGGNet in Keras, 176–179
CNNs. See Convolutional neural networks (CNNs)
converting existing project, 316–317
datasets for deep learning image-classification models, 313–315
key concepts, 193
object recognition tasks, 52–53
Quick, Draw! game, 19
Software 2.0 and, 326
traditional machine learning approach, 12–13
Machine vision, applications of
capsule networks, 192
Fast R-CNN, 184
Mask R-CNN, 187
object detection, 183
overview of, 182
Magnetic resonance imaging (MRI), and visual cortex, 7–8
Manipulation of objects, 67–68
Markov decision process (MDP), 286–290
Mask R-CNN, image segmentation with, 186–187
Mass, Andrew, 203
matplotlib, weight initialization, 132
max
operation, pooling layers, 170
Max-pooling layers
AlexNet and VGGNet in Keras, 176–179
MaxPooling2D
dependency, LeNet-5 in Keras, 171–174
MCTS (Monte Carlo tree search) algorithm, 61, 66
MDP (Markov decision process), 286–290
Memory
batch size/stochastic gradient descent and, 119–122
DQN agent gameplay, 298
Software 2.0 and, 326
training DQN agent via replay of, 298–299
Metrics, SLM Lab performance, 305–306
Milestones, deep learning for NLP, 24–25
min_count
argument, word2vec, 210–211
Minibatches, splitting training data into, 119–122
ML. See Machine learning (ML)
MNIST handwritten digits
calculus for backpropagation, 337
coding shallow network in Keras, 76–78
computational complexity in dense networks, 160
Fashion-MNIST dataset deep learning project, 313–315
loading data for shallow net, 80–81
loss of two-dimensional imagery in dense networks, 159–160
reformatting data for shallow net, 81–83
schematic diagram of shallow network, 77–79
software dependencies for shallow net, 80
in stochastic gradient descent, 120
training deep networks with data augmentation, 145
Model generalization. See Overfitting, avoiding
Model optimization, agents beyond DQN using, 307
ModelCheckpoint()
object, dense sentiment classifier, 231–232
Modeling process, building own project, 318–321
Monte Carlo tree search (MCTS) algorithm, 61, 66
Morphemes, natural human language, 34
Morphology, natural human language, 34–35
most_similar()
method, word2vec, 212–213
Motion, detecting in visual cortex, 7–8
Mountain Car game, 316
MRI (magnetic resonance imaging), and visual cortex, 7–8
Müller, Vincent, 72
Multi ConvNet Sentiment Classifier Jupyter notebook, 320
MXNet, deep learning library, 324
n-dimensional spaces, 42–43, 339
Natural human language, elements of, 33–35
Natural language classification
dense network classifier architecture, 229–235
with familiar networks, 222
loading IMDb film reviews, 222–226
standardizing length of reviews, 228–229
Natural Language Preprocessing Jupyter notebook, 197
Natural language processing (NLP)
building own deep learning project, 315–316
building socially beneficial projects, 318
computational representations of. See Language, computational representations of
deep learning approaches to, 53
Google Duplex as deep-learning, 35–37
history of deep learning, 24–25
learning representations automatically, 22–23
natural human language elements of, 33–35
natural language classification in. See Natural language classification
networks designed for sequential data, 240–251
non-sequential architectures, 251–256
overview of, 195
preprocessing. See Preprocessing natural language data
Software 2.0 and, 326
summary, 256
transfer learning in, 251
word embedding with word2vec. See word2vec
n_components
, plotting word vectors, 214
Negative rewards, reinforcement learning problems and, 56
Negative sampling, training word2vec, 208
Neocognition
LeNet-5 advantages over, 13–14
Nesterov momentum optimizer, stochastic gradient descent, 146
Network architecture, regression model, 150–151
Network depth, as hyperparameter, 125–126
Neural Information Processing Systems (NIPS) conference, 41
Neural machine translation (NMT), seq2seq models, 250
Neural networks
building deep in PyTorch, 343–344
coding shallow in Keras, 83
Neuron saturation. See Saturated neurons
Neurons
AlexNet vs. LeNet-5, 17
behaviors of biological, 85–86
forming primary visual cortex, 4–7
regions processing visual stimuli in visual cortex, 7–8
TensorFlow Playground and, 17–19
tuning hidden-layer count and number of, 126
next_state
, DQN agent gameplay, 298
NIPS (Neural Information Processing Systems) conference, 41
n_iter
, plotting word vectors, 214
NLP. See Natural language processing (NLP)
NMT (neural machine translation), seq2seq models, 250
Noë, Alva, 39
Non-sequential model architecture, 251–256
Non-trainable params, model object, 109–110
Nonlinearity, of ReLU neuron, 95
Notation, formal neural network, 333–334
Number of epochs of training
as hyperparameter, 122
rule of thumb for learning rate, 119
stochastic gradient descent and, 119–122
training deep learning model, 83–84
NumPy
Object detection
with Fast R-CNN, 184
as machine vision application, 182–183
understanding, 183
Objective function (π), maximizing reward with, 290
Objects
recognition tasks of machine vision, 52–53
Occam’s razor, neuron count and, 126
On-device processing, machine learning for, 46–48
One-hot format
computational representations of language via, 25–26
converting integer labels to, 82–83
localist vs. distributed representations, 32–33
Online resources
building deep learning projects, 317–318
pretrained word vectors, 230
OpenAI Gym
building deep learning projects, 316
deep reinforcement learning, 68–70
interacting with environment, 300–303
Optimal policy
building neural network model for, 288–290
estimating optimal action via Q-learning, 290–292
Optimal Q-value (Q*), estimating, 291–292
Optimization
agents beyond DQN using, 306–307
fancy optimizers for stochastic gradient descent, 145–147
hyperparameter optimizers, 130, 303–306
minimizing cost via. See Cost, minimizing via optimization
stochastic gradient descent. See Stochastic gradient descent (SGD)
Output layer
artificial neural network with, 99
batch normalization and, 139
building network model for DQN agent, 298
calculus behind backpropagation, 335, 337
deep learning model architectures, 51–52
LSTM, 245
notation for neural networks, 334
schematic diagram of shallow network, 79
softmax layer for multiclass problems, 106–108
softmax layer of fast food-classifying network, 106–107
TensorFlow Playground demo, 100
Overfitting, avoiding
building your own project, 320
data augmentation, 145
Pac-Man
discount factor (decay) and, 288–289
DQN agent initialization and, 296
Padding
convolutional example of, 163–167
as convolutional filter hyperparameter, 167–168
standardizing length of IMDb film reviews, 228–229
Parameter initialization, building own project, 319
Parameters. See also Hyperparameters
Cart-Pole DQN agent initialization, 295–297
creating dense network classifier architecture, 230–232
escaping local minimum, 122–124
gradient descent minimizing cost across multiple, 116–117
pooling layers reducing overall, 169–170
saving model, 300
weight initialization, 132–135
Parametric ReLU activation function, 96
Partial-derivative calculus, cross-entropy cost, 114–115
Patches, in convolutional layers, 160
PCA (principal component analysis), 213
Perceptrons
choosing, 96
hot dog/not hot dog detector example, 86–90
modern neurons vs., 91
as most important equation in this book, 90–91
overview of, 86
Performance
hyperparameter optimization using SLM Lab, 303–306
Software 2.0 and, 326
PG. See Policy gradient (PG) algorithm
Phonemes, natural human language and, 34
Phonology, natural human language and, 34–35
Photorealistic images, creating. See Machine art
Phraser()
method, NLP, 202–203, 204–205
Phrases()
method, NLP, 202–203, 204–205
pix2pix web application, 45–46
Pixels
computational complexity and, 160
converting integers to floats, 82
convolutional example of, 163–167
convolutional layers and, 160–162
handwritten MNIST digits as, 77–78
kernel size hyperparameter of convolutional filters, 167
reformatting data for shallow net, 81–83
schematic diagram of shallow network, 78–79
two-dimensional imagery and, 159–160
Plotting
GAN training accuracy, 281
Policy function (π), discounted future reward, 288–290
Policy gradient (PG) algorithm
actor-critic using Q-learning with, 307–308
in deep reinforcement learning, 68
REINFORCE algorithm as, 307
Policy networks, AlphaGo, 61
Policy optimization
agents beyond DQN using, 307
building neural network model for, 288–290
estimating optimal action via Q-learning, 290–292
RL agent using actor-critic with, 307–308
Positive rewards, deep reinforcement learning, 56, 57
Prediction
selecting action for DQN agent, 300
training dense sentiment classifier, 232
training DQN agent via memory replay, 299
word2vec using predictive models, 208
Preprocessing natural language data
converting all characters to lowercase, 199–200
removing stop words and punctuation, 200–201
stemming, 201
Principal component analysis (PCA), 213
Probability distribution, Markov decision processes, 288
Processing power, AlexNet vs. LeNet-5, 16–17
Project Gutenberg. See Preprocessing natural language data
Punctuation
processing full corpus, 204–206
Python, for example code in this book, 75–76
PyTorch
building deep neural network in, 343–344
deep learning library, 323–324
installation, 341
Q-learning networks
actor-critic combining PG algorithms with, 307–308
DQNs. See Deep Q-learning networks (DQNs)
Q-value functions
agents beyond DQN optimizing, 306
drawbacks of DQN agents, 306
training DQN agent via memory replay, 299
Quake III Arena, DeepMind Lab built on, 69
Quick, Draw! game
for hundreds of machine-drawn sketches, 48
introduction to deep learning, 19
R-CNN
Fast R-CNN, 184
object detection application, 183–184
RAM (memory), batch size/stochastic gradient descent and, 119–122
rand
function, DQN agent action selection, 299–300
randrange
function, DQN agent action selection, 300
Rectified linear unit neurons. See ReLU (rectified linear unit) neurons
Recurrent neural networks (RNNs)
LSTM cell as layer of NLP in, 53
stacked recurrent models, 248–250
Reformatting data, coding shallow network, 81–83
Regions of interest (ROIs)
developing Faster R-CNN, 184–185
image segmentation with Mask R-CNN, 187
object detection with Fast R-CNN, 184
object detection with R-CNN, 183–184
Regression, improving deep networks, 149–152
REINFORCE algorithm, agents beyond DQN using, 307
Reinforcement learning
building socially beneficial projects, 318
overview of, 49
problems of machine learning, 54–56
as sequential decision-making problems, 284
Reinforcement Learning: An Introduction (Barto), 292
Reinforcement learning, deep
board games. See Board games
building own project. See Deep learning projects, building own
essential theory of deep Q-learning networks, 290–292
essential theory of reinforcement learning, 283–286
game-playing applications. See Game-playing machines
hyperparameter optimization with SLM Lab, 303–306
interacting with OpenAI Gym environment, 300–303
manipulation of objects, 67–68
Markov decision processes, 286–288
popular learning environments for, 68–71
summary, 308
ReLU (rectified linear unit) neurons
with Glorot distributions, 136–137
neural network model for DQN agent, 297
as preferred neuron type, 96
TensorFlow Playground demo, 100
Representation learning, 22, 51
requires_grad
argument, PyTorch, 342
Residual networks (ResNets), 180–182
Resources, building deep learning projects, 317–318
return_sequencesTrue
, stacking recurrent layers, 248
Reward(s)
deep Q-learning network theory, 290–292
DeepMind DQN and, 59
DQN agent gameplay, 298
Markov decision processes (MDPs), 287–289
reinforcement learning problems and, 56
theory of reinforcement learning, 283
training DQN agent via memory replay, 298–299
Ridge regression, reducing model overfitting, 141–142
RMSProp, 147
RMSProp optimizer, 147
ROC AUC metric
as area under ROC curve, 217–218
for sentiment classifier model architectures, 256
ROIs. See Regions of interest (ROIs)
Round of training, stochastic gradient descent, 120–121
Running time, Software 2.0 and, 325
Sabour, Sara, 192
Saturated neurons
as flaw in calculating quadratic cost, 112–113
minimizing impact using cross-entropy cost, 113–115
reducing with cross-entropy cost and weight initialization, 131–135
weight initialization, Glorot normal distribution, 136
Saving model parameters, 300
Schematic diagram
activation values in feature map of convolutional layer, 164
coding shallow network in Keras, 77–79
of discriminator network, 268
of generator network, 270
of LSTM, 245
of recurrent neural network, 241
wide and deep modeling, 317
Schmidhuber, Jürgen, 244
Search, automating hyperparameter, 321
Sedol, Lee, 62
See-in-the-Dark dataset, image processing, 47–48
Semantics, natural human language and, 34–35
sentences
argument, word2vec, 210
Sentiment classifier
LSTM architecture, 247
non-sequential architecture example, 251–255
performance of model architectures, 256
seq2seq (sequence-to-sequence), and attention, 250
Sequential decision-making problems, 284
Sequential model, building for DQN agent, 297–298
sg
argument, word2vec, 210
SG (skip-gram) architecture, 207, 208
SGD. See Stochastic gradient descent (SGD)
Shadow Dexterous Hand, OpenAI Gym, 70
Shallow network
coding. See Coding shallow network in Keras
intermediate-depth neural network in, 127–129
Short-term memory, LSTM, 245–246
Sigmoid Function Jupyter notebook, 105
Sigmoid neuron(s)
for binary classification problems, 100–101, 105–106
choosing, 96
for shallow net in Keras, 79, 83
softmax function with single neuron equivalent to using, 108
weight initialization and, 133–137
Similarity score, running word2vec, 212–213
Simple neurons
forming primary visual cortex, 6–7
SimpleRNN()
layer, RNN sentiment classifier, 243
size
argument, word2vec, 210
Skip-gram (SG) architecture, 207, 208
Socially beneficial projects, deep learning projects, 318
Softmax layer, fast food-classifying network, 106–108
Softmax probability output, Fast R-CNN, 184
Software dependencies, shallow net in Keras, 80
Sofware 2.0, deep learning models, 324–326
Speech recognition, NLP in, 24
Spell-checkers, 24
Squared error, as quadratic cost, 112
Stacked recurrent models, 248–250
StackGAN, photorealistic images from text, 45–46
State(s)
deep Q-learning network theory and, 290–292
DeepMind DQN and, 58
DQN agent, remembering gameplay, 298
Markov decision processes and, 286
optimal policy in deep reinforcement learning and, 289–290
reinforcement learning problems and, 56
reinforcement learning via Cart-Pole game and, 286
theory of reinforcement learning, 284
Static scatterplot, plotting word vectors, 214–216
Stemming, word
overview of, 201
preprocessing natural language via, 196
Stochastic gradient descent (SGD)
escaping local minimum of cost via, 122–124
training deep networks using batch size and, 119–124
Stop words
how to remove, 200
Stride length
as convolutional filter hyperparameter, 167
reducing computational complexity, 170
Suleyman, Mustafa, 58
Supervised learning problems, machine learning, 53–54
Support vector machines, R-CNN, 183–184
Sutton, Richard, 292
Tanh neurons
activation function of, 94
choosing, 96
with Glorot distributions, 136–137
Target word
converting natural words to word vectors, 27–28
Tensor processing units (TPUs), Google training neural networks, 64
TensorBoard dashboard, 152–154
TensorFlow Playground, 17–19, 100
Tensors, PyTorch
automatic differentiation in, 342–343
building deep neural network, 343–344
compatibility with NumPy operations, 324
Terminal state, theory of reinforcement learning, 284
Text, creating photorealistic images from, 45–46
Text-to-speech (TTS) engine, Google Duplex, 36–37
Theano, deep learning library, 324
Theory, essential
of deep Q-learning networks, 290–292
of reinforcement learning, 283–284
Threshold value, perceptron equation, 89–91
Tokenization
natural human language and, 35–36
preprocessing natural language, 195, 197–199
Torch, PyTorch as extension of, 323–324
torch.nn.NLLLoss()
function, PyTorch, 344
TPUs (tensor processing units), Google training neural networks, 64
Traditional machine learning (ML) approach
deep learning approach vs., 11–12
entrants into ILSVRC using, 14–15
natural human language in, 33–35
one-hot encoding of words in, 25–26
train()
method
training DQN agent, 299
Training
AlphaGo vs. AlphaGo Zero, 63–65
Training deep networks
batch size and stochastic gradient descent, 119–122
coding shallow network in Keras, 83–84
convolutional sentiment classifier, 238
data augmentation for, 145
deep neural network in Keras, 147–149
dense sentiment classifier, 232
escaping local minimum, 122–124
generative adversarial networks (GANs), 259–262, 275–281
intermediate-depth neural network, 128–129
intermediate net in Keras, 127–129
key concepts, 130
minimizing cost via optimization, 115
overview of, 111
preventing overfitting with dropout, 142–145
recurrent neural networks (RNNs), 241
running word2vec, 208
transfer learning model of, 188–192
tuning hidden-layer and neuron counts, 125–126
via memory replay for DQN agent, 298–299
Transfer learning
natural language and, 230
in NLP, 251
Truncation, standardizing film review length, 228–229
TSNE()
method, plotting word vectors, 214–216
TTS (text-to-speech) engine, Google Duplex, 36–37
Two-dimensional images, flattening to one dimension, 82
Two-dimensional structure of visual imagery
retaining in convolutional layers, 167
retaining using LeNet-5 in Keras, 172
U-Net, image segmentation, 187–188
ULMFiT (universal language model fine-tuning), transfer learning, 251
United States Postal Service, LeNet-5 reading ZIP codes, 11
Unity ML-Agents plug-in, 71, 304
Unstable gradients, improving deep networks, 137–139
Unsupervised learning problems, machine learning, 54
Value functions, Q-learning, 291–292
Value networks, AlphaGo algorithm, 61
Value optimization
agents beyond DQN using, 306
RL agent using actor-critic algorithm and, 307–308
Vanishing gradient problem
in artificial neural networks, 137–138
performance degradation in deep CNNs, 179–180
Vector space
embeddings. See Word vectors
latent space similarities to, 42–43
word meaning represented by three dimensions, 27–29
Visual imagery, two-dimensional structure of, 159–160
Visual perception
WaveNet, Google Duplex TTS engine, 36–37
Weight initialization, 131–137
Weighted sum, perceptron algorithm, 86–89
Weight(s)
backpropagation and, 125, 335–337
convolutional example of, 163–167
of kernels in convolutional layers, 160–162
minimizing cost via gradient descent, 115–116
notation for neural networks, 334
Wide and deep modeling approach, Google, 317
Wiesel, Torsten
LeNet-5 model built on work of, 10–12
machine vision using work of, 8–9
research on visual cortex, 4–7
window
argument, word2vec, 210
Wittgenstein, Ludwig, 21
Word embeddings. See Word vectors
Word vectors. See also word2vec
capturing word meaning, 195
computational representations. See Language, computational representations of
convolutional filters detecting triplets of, 239
evaluating, 209
localist vs. distributed representations, 32–33
in NLP. See Natural language processing (NLP)
online pretrained, 230
training on natural language data, 229–230
word2viz tool for exploring, 30–32
word2vec
converting natural words to word vectors, 28
essential theory behind, 206–209
evaluating word vectors, 209
FastText as leading alternative to, 209
plotting word vectors, 213–217
word embeddings, 206
Words
creating embeddings with word2vec. See word2vec
natural human language and, 33–35
preprocessing natural language. See Preprocessing natural language data
word_tokenize()
method, natural language, 199
workers
argument, word2vec, 211
Xavier Glorot distributions, improving deep networks, 135–137
Yelp review polarity, 316
Zhang, Xiang, 315
3.144.17.45