`act()`

method, defining DQN agent, 299–300, 301

Action potential, of biological neurons, 85–86

Action(s)

deep Q-learning network theory, 290–292

DeepMind DQN and, 59

Markov decision processes and, 286

reinforcement learning problems and, 55–56

Activation functions

calculus behind backpropagation, 335–336

choosing neuron type, 96

convolutional example, 164–166

nonlinear nature of in deep learning architectures, 95

softmax layer of fast food-classifying network, 106–108

tanh neuron, 94

Activation maps

convolutional networks and, 238

in discriminator network, 267–268

in generator network, 269, 272

LeNet-5 ConvNet architecture, 173–175

as output from convolutional kernels, 163–167

pooling layers spatially reducing, 169–170

Actor-critic algorithm, RL agent, 307–308

AdaGrad optimizer, 146

Adaptive moment estimation (Adam) optimizer, 147, 148–149

Adversarial network, GANs, 272–274

Agent(s)

deep Q-learning network theory, 290–292

deep reinforcement learning and, 57

DeepMind DQN, 58

DQN. *See* DQN agents

optimal policy in deep reinforcement learning, 289–290

reinforcement learning problems of machine learning, 54–56

reinforcement learning theory, 283

SLM Lab, 304

AGI (artificial general intelligence), 72, 326–328

AI. *See* Artificial intelligence (AI)

AlexNet

CNN model inspired by, 176–177

history of deep learning for NLP, 25

ReLU neurons in, 95

Algorithms, development of AGI and, 327

AlphaGo Master, 64

Amazon review polarity, NLP training/validation samples, 316

ANI (artificial narrow intelligence), 72, 326–327

Architecture

AlexNet hierarchical, 16

bidirectional LSTM sentiment classifier, 248

convolutional sentiment classifier, 237–238

deep learning model, 52

deep net in Keras model, 148–149

dense sentiment classifier, 229–231

generalist neural network as single network, 52

intermediate-depth neural network, 127–128

LeNet-5 hierarchical, 9–11, 172–176

LSTM, 247

multi-ConvNet sentiment classifier, 253–254

regression model network, 150–151

residual network, 182

RNN sentiment classifier, 243–244

shallow neural network, 78–79, 83

stacked recurrent model, 249

TensorFlow Playground, 17

weight initialization, 133–135

Arithmetic

Art. *See* Machine art

Artificial general intelligence (AGI), 72, 326–328

Artificial intelligence (AI)

deep learning for NLP relevant to, 53

driven by deep learning, 52

general-purpose learning algorithms for, 58

history of chess and, 65

machine learning as subset of, 50

OpenAI Gym environments as, 68–70

Artificial narrow intelligence (ANI), 72, 326–327

Artificial neural networks (ANNs). *See also* Artificial neurons, constituting ANNs

AlphaGo Zero development, 63

architecture for shallow networks, 83–84

building model for DQN agent, 297–298

deep reinforcement learning using, 56–57

dominating representation learning, 51

hot dog-detecting dense network, 101–106

input layer, 99

key concepts, 110

manipulation of objects via, 67–68

schematic diagram of Jupyter network, 77–79

softmax layer of fast food-classifying network, 106–108

summary, 110

Artificial neurons

deep learning and, 22

deep learning model architectures, 51–52

Artificial neurons, constituting ANNs

biological neuroanatomy, 85–86

choosing neuron type, 96

hot dog/not hot dog detector, 86–90

key concepts, 97

modern neurons/activation functions, 91–95

most important equation in this book, 90–91

overview of, 85

perceptrons as early, 86

summary, 96

tanh neuron, 94

Artificial super intelligence (ASI), 72, 327

`astype()`

method, LeNet-5 in Keras, 172

Attention, seq2seq and, 250

Backpropagation

of bidirectional LSTMs, 247

cross-entropy costs and, 114

enabling neural networks to learn, 113

minimizing cost, 115

partial-derivative calculus behind, 335–337

training recurrent neural networks, 241

tuning hidden-layer and neuron counts, 125–126

BAIR (Berkeley Artificial Intelligence Research) Lab, 44–45

Batch normalization

deep neural networks in Keras, 148

improving deep networks, 138–139

network architecture regression model, 150–151

Batch size

of 1, also known as online learning, 124

building own project and, 320

escaping local minimum of cost, 122–124

as hyperparameter, 119

and stochastic gradient descent, 119–122

Behavioral cloning, 307

Benchmarking performance, SLM lab, 304

Bengio, Yoshua

Turing Award for deep learning, 15

weight initialization and Glorot normal distribution, 135–137

Berkeley Artificial Intelligence Research (BAIR) Lab, 44–45

BERT (bi-directional encoder representations from transformers), NLP, 251

beta (*β*) hyperparameter

batch normalization adding, 139

Bi-directional encoder representations from transformers (BERT), NLP, 251

bias (*b*)

adding to convolutional layers, 162

in convolutional example, 164

minimizing cost via gradient descent, 115–116

notation for neural networks, 333

Bidirectional LSTMs (Bi-LSTMs), 247–249

Binary-only restriction, of perceptrons, 91–92

Biological neurons

creating perceptron algorithm with, 86

ReLU neuron activation function, 94–95

Board games

overview of, 59

`boston_housing`

dataset, 149–150

Bostrom, Nick, 72

Bounding boxes, developing YOLO, 185–186

`build_discriminator`

function, 266–268

Caffe, deep learning library, 324

Calculus, in backpropagation, 335–337

`callbacks`

argument

dense sentiment classifier, 232

Cambrian explosion, 3

Capsule networks, machine vision and, 192

Cart-Pole game

defining DQN agent for. *See* DQN agents

DQN agent interaction with OpenAI Gym, 300–303

estimating optimal Q-value, 292

hyperparameter optimization using SLM Lab, 304

Markov decision processes in, 288–289

as reinforcement learning problem, 284–286

CartPole, OpenAI Gym environment, 70

CBOW (continuous bag of words), word2vec, 207, 208

Cell body, biological neurons, 85–86

Cerebral cortex, processing visual information, 3–4

cGAN (conditional GAN), 45

Chain rule of calculus, backpropagation and, 124

Chatbots, natural language processing in, 23–24

Checkpoints, dense sentiment classifier, 231

Chen, Chen, deep learning image processing, 47–48

Chess

vs. Go board complexity, 59, 61

Classification

adding layers to transfer learning model, 190

convolutional sentiment classifier, 235–239

of film reviews by sentiment, 229–235

natural language. *See* Natural language classification

as supervised learning problem, 53–54

CNNs. *See* Convolutional neural networks (CNNs)

CNTK, deep learning library, 324

Coding shallow network in Keras

designing neural network architecture, 83

installation, 76

MNIST handwritten digits, 76–77

schematic diagram of network, 77–79

software dependencies for shallow net, 80

summary, 84

training deep learning model, 83–84

Color, visual cortex detects, 7–8

Compiling

adversarial network, 274

dense sentiment classifier, 231

discriminator network, 269

network model for DQN agent, 298

Complex neurons

forming primary visual cortex, 6–7

neocognition and, 9

Computational complexity

minimizing number of kernels to avoid, 163

from piping images into dense networks, 160

Computational homogeneity, with Software 2.0, 325

Computing power, AGI and development of, 327

Conditional imitation learning algorithms, 307

Content generation, building socially beneficial projects, 318

Context words, running word2vec, 207–209

Continuous bag of words (CBOW), word2vec, 207, 208

Continuous variable, supervised learning problem, 54

Contracting path, U-Net, 187–188

`Conv2D`

dependency, LeNet-5 in Keras, 171–174

Convolutional filter hyperparameters, CNNs, 168–169

Convolutional layers

convolutional neural networks (CNNs) and, 160–162

general approach to CCN design, 176

working with pooling layers, 169–170

Convolutional layers, GANs

birth of GANs, 41

convolutional neural networks (CNNs) and, 52–53

results of latent space arithmetic, 42–44

Convolutional neural networks (CNNs)

computational complexity, 160

contemporary machine vision and, 52–53

convolutional filter hyperparameters, 168–169

DeepMind DQN using, 58

detecting spatial patterns among words, 235–239

developing Faster R-CNN, 184–185

general approach to CCN design, 176

image segmentation with Mask R-CNN, 187

manipulation of objects via, 67–68

model inspired by AlexNet, 176–178

model inspired by VGGNet, 178–179

object detection with Fast R-CNN, 184

object detection with R-CNN, 183–184

overview of, 159

transfer learning model of, 188–192

two-dimensional structure of visual imagery, 159–160

Convolutional sentiment classifier, 235–239, 252–256

`convTranspose`

layers, in generator networks, 270

Corpus

one-hot encoding of words within, 25–26

word2vec architectures for, 208

Cortes, Corinna, curating MNIST dataset, 77–78

Cost (loss) functions

building own project, 319

in stochastic gradient descent, 120

training deep networks and, 111

using backpropagation to calculate gradient of, 124–125, 335–337

Cost, minimizing via optimization

batch size and stochastic gradient descent, 119–122

escaping local minimum, 122–124

training deep networks and, 115

Count based, word2vec as, 208

Cross-entropy cost

essential GAN theory, 262

minimizes impact of neuron saturation, 113–115, 131

Data

augmentation, training deep networks, 145

development of AGI and, 327

Data generators, training, 190–191

DataFrame, IMDb validation data, 234

Datasets, deep reinforcement learning using larger, 57

De-convolutional layers, generator networks, 269–270, 272

deCNN, generator network as, 270

Deep Blue, history of chess, 65

Deep learning

code. *See* Coding shallow network in Keras

computational representations of language. *See* Language, computational representations of

definition of, 22

elements of natural human language in, 33–35

Google Duplex as NLP based on, 35–37

natural language processing and, 23–25, 37

networks learn representations automatically, 22–23

reinforcement learning combined with. *See* Reinforcement learning, deep

training deep networks. *See* Training deep networks

Deep learning, introduction

machine vision. *See* Machine vision

Quick, Draw! game, 19

summary, 20

traditional machine learning vs., 11–12

Deep learning projects, building own

artificial general intelligence approach, 326–328

converting existing machine learning project, 316–317

deep learning libraries, 321–324

deep reinforcement learning, 316

machine vision and GANs, 313–315

modeling process, including hyperparameter tuning, 318–321

natural language processing, 315–316

overview of, 313

resources for further projects, 317–318

Deep networks, improving

deep neural network in Keras, 147–149

model generalization (avoiding overfitting), 140–145

overview of, 131

summary, 154

weight initialization, 131–135

Xavier Glorot distributions, 135–137

Deep Q-learning networks (DQNs)

DeepMind video game and, 58–60

defining DQN agent. *See* DQN agents

Deep reinforcement learning. *See* Reinforcement learning, deep

DeepMind

AlphaGo Zero board game, 62–65

Google acquiring, 59

DeepMind Lab

building own deep learning project with, 316

deep reinforcement learning, 69, 71

Dendrites, and biological neurons, 85–86

Denormalization, in batch normalization, 139

Dense layers

architecting intermediate net in Keras, 127–128

artificial neural networks with, 99–100

CNN model inspired by AlexNet, 177–178

computational complexity and, 160

convolutional layers vs., 168

deep learning and, 51

Fast R-CNN and, 184

general approach to CCN design, 176

LeNet-5 in Keras and, 172–173, 175–176

multi-ConvNet model architecture, 253–255

in natural language processing, 224–225, 230–231, 236–238

networks designed for sequential data, 243

in shallow networks, 109

using weight initialization for deep networks, 132–133, 137

in wide and need model architecture, 317

Dense network

building socially beneficial projects, 318

defined, 100

revisiting shallow network, 108–110

softmax layer of fast food-classifying network, 106–108

Dense sentiment classifier, 229–235

*Dense Sentiment Classifier* Jupyter notebook. *See* Natural language classification

Dependencies

Cart-Pole DQN agent, 293

convolutional sentiment classifier, 236

LeNet-5 in Keras, 171

loading GAN for Quick, Draw! game, 264–265

loading IMDb film reviews, 222–223

preprocessing natural language, 197

regression model, 150

TensorFlow with Keras layers, 323

Dimensionality reduction, plotting word vectors, 213–217

Discount factor (decay), Markov decision processes, 288–289

Discounted future reward

expected, 290

maximizing, 290

Discriminator network, GANs

code for training, 277

Distributed representations, localist representations vs., 32

Dot product notation, perceptron equation, 90–91

DQN agents

building neural network model for, 297–298

drawbacks of, 306

hyperparameter optimization using SLM Lab, 304

initialization parameters, 295–297

interacting with OpenAI Gym environment, 300–303

remembering gameplay, 298

selecting action to take, 299–300

training via memory replay, 298–299

DQNs. *See* Deep Q-learning networks (DQNs)

Dropout

for AlexNet in Keras, 177

for deep neural network in Keras, 148

Eager mode, TensorFlow, 322–323

Ease of use, Software 2.0, 326

Efros, Alexei, 44

ELMo (embeddings from language models), transfer learning, 251

Elo scores

AlphaZero game, 66

Encoder-decoder structure, NMT, 250

Environment(s)

DeepMind DQN, 58

popular deep reinforcement learning, 68

reinforcement learning problems of machine learning, 54–56

reinforcement learning theory, 283

training agents simultaneously via SLM Lab in multiple, 304

Epochs of training, checkpointing model parameters after, 231–232

Essential theory. *See* Theory, essential

`exp`

function, softmax layer of fast food-classifying network, 106–108

Expanding path, U-Net, 187–188

Experiment graph, SLM Lab, 304

Expertise, subject-matter

AutoNet reducing requirement for, 17

deep learning easing requirement for, 22–23

Exploding gradients, ANNs, 138

Extrinsic evaluations, evaluating word vectors, 209

Face detection

arithmetic on fake human faces, 41–44

birth of generative adversarial networks, 39–41

engineered features for robust real-time, 12–13

in visual cortex, 8

Facebook, fastText library, 33

False negative, IMDb reviews, 236

False positive, IMDb reviews, 235

Fan Hui, AlphaGo match, 62

Fancy optimizers, deep network improvement, 145–147

Fashion-MNIST dataset, deep learning project, 313–315

Fast food-classifying network, softmax layer of, 106–108

Fast R-CNN, object detection, 184

Faster R-CNN, object detection, 184–185

Feature engineering

AlexNet automatic vs. expertise-driven, 17

defined, 11

traditional machine learning and, 12–13

traditional machine learning vs. deep learning, 11–12

Feature maps

convolutional example of, 163–167

image segmentation with U-Net, 188

transfer learning model and, 188–192

Feedforward neural networks, training, 241

FetchPickAndPlace, OpenAI Gym, 70

Figure Eight

image-classification model, 315

natural language processing model, 316

Filters. *See* Kernels (filters)

Finn, Chelsea, 67

`fit_generator()`

method, transfer learning, 191–192

Fitting, dense sentiment classifier, 232

`Flatten`

layer, LeNet-5 in Keras, 171–174

`FloatTensor`

, PyTorch, 339

`for`

loop, GAN training, 275–281

Formal notation, neural networks, 333–334

Forward propagation

backpropagation vs., 124

defined, 103

in hot dog-detecting dense network, 101–106

notation for neural networks, 334

in softmax layer of fast food-classifying network, 106–108

in stochastic gradient descent, 120, 121

Frozen Lake game, 316

Fukushima, Kunihiko, LeNet-5, 9–12

Fully connected layer (as dense layer), 99

Functional API, non-sequential architectures and Keras, 251–256

Fusiform face area, detecting in visual cortex, 8

Game-playing machines

artificial intelligence, 49–50

artificial neural networks (ANNs), 51

categories of machine learning problems, 53–56

deep reinforcement learning, 56–57

machine learning, 50

manipulation of objects, 67–68

natural language processing, 53

popular deep reinforcement learning environments, 68–71

representation learning, 51

Software 2.0 and, 326

summary, 72

gamma (*γ*), batch normalization adding, 139

GANs. *See* Generative adversarial networks (GANs)

Gated recurrent units (GRUs), 249–250

`gberg_sents`

, tokenizing natural language, 199

Generative adversarial networks (GANs)

actor-critic algorithm reminiscent of, 308

adversarial network component, 272–274

arithmetic on fake human faces, 41–44

building and tuning own, 315

creating photorealistic images from text, 45–46

discriminator network component, 266–269

generator network component, 269–272

high-level concepts behind, 39

image processing using deep learning, 47–48

making photos photorealistic, 45

Quick, Draw! game dataset, 263–266

reducing computational complexity with, 170

Software 2.0 and, 326

summary, 281

Generator network, GANs

GitHub repository, Quick, Draw! game dataset, 263

Global minimum of cost, training deep networks for, 122–124

Glorot normal distribution, improving deep networks, 135–137

GloVe

converting natural words to word vectors, 28

as major alternative to word2vec, 208

Goodfellow, Ian

arithmetic on fake human faces and, 41–44

Google Duplex technology, deep-learning-based NLP, 35–37

GPUs (graphics processing units), deep reinforcement learning, 57

Gradient descent

batch size and stochastic, 119–122

cross-entropy costs and, 114

enabling neural networks to learn, 113

escaping local minimum using, 122–124

training deep networks with batch size/stochastic, 119–122

Graesser, Laura, 304

Graphics processing units (GPUs), deep reinforcement learning, 57

GRUs (gated recurrent units), 249–250

Gutenberg, Johannes, 197

HandManipulateBlock, OpenAI Gym, 70

Handwritten digits, MNIST, 76–78

Hidden layers

artificial neural network with, 99

building network model for DQN agent, 297

calculus behind backpropagation, 337

deep learning model architectures, 51–52

dense layers within. *See* Dense layers

forward propagation in dense network through, 102–106

hot dog-detecting dense network, 101–106

neural network notation, 333–334

schematic diagram of shallow network, 79

TensorFlow Playground demo, 100

tuning neuron count and number of, 125–126

Hidden state, LSTM, 245

Hierarchical softmax, training word2vec, 208

Hinton, Geoffrey

developing capsule networks, 192

developing t-distributed stochastic neighbor embedding, 213–214

as godfather of deep learning, 14–15

Histogram of validation data

convolutional sentiment classifier, 239

dense sentiment classifier, 233–234

Hochreiter, Sepp, 244

Hot dog-detecting dense network, 101–106

Hot dog/not hot dog detector, perceptrons, 86–90

Hubel, David

LeNet-5 model built on work of, 10–12

machine vision approach using work of, 8–9

research on visual cortex, 4–7

Human and machine language. *See also* Language, computational representations of

deep learning for natural language processing, 21–25

elements of natural human language in, 33–35

Google Duplex technology, 35–37

summary, 37

Humanoid, OpenAI Gym environment, 70

Hyperparameters. *See also* Parameters

in artificial neural networks, 130

automating search for, 321

batch size, 119

convolutional filter, 163, 167

convolutional sentiment classifier, 236–237

learning rate, 118

for loading IMDb film reviews, 223–225

multi-ConvNet sentiment classifier, 253

number of epochs of training, 122

optimizing with SLM Lab, 303–306

reducing model overfitting with dropout, 144–145

RMSProp and AdaDelta, 147

IMDb (Internet Movie Database) film reviews. *See* Natural language classification

ILSVRC (ImageNet Large Scale Visual Recognition Challenge)

ResNet, 182

traditional ML vs. deep learning entrants, 13–14

Image classification

building socially beneficial projects using, 318

ILSVRC competition for, 182

machine vision datasets for deep learning, 313–315

object detection vs., 183

Image segmentation applications, machine vision, 186–188

`ImageDataGenerator`

class, transfer learning, 190–191

Images

creating photorealistic. *See* Machine art

processing using deep learning, 46–48

Imitation learning, agents beyond DQN optimizing, 307

Infrastructure, rapid advances in, 327

Initialization parameters, DQN agent, 295–297

Input layer

artificial neural networks with, 99

of deep learning model architectures, 51–52

hot dog-detecting dense network, 101–106

LSTM, 245

notation for neural networks, 333

schematic diagram of shallow network, 79

TensorFlow Playground demo, 100

Installation

of code notebooks, 76

PyTorch, 341

Integer labels, converting to one-hot, 82–83

*Intermediate Net in Keras* Jupyter notebook, 127–129

Internal covariate shift, batch normalization, 138–139

Internet Movie Database (IMDb) film reviews. *See* Natural language classification

Intrinsic evaluations, word vectors, 209

`iter`

argument, running word2vec, 210

Kaggle

image-classification model, 315

natural language processing model, 316

Kasparov, Garry, 65

Keng, Wah Loon, 304

Keras

AlexNet and VGGNet in, 176–179

coding in. *See* Coding shallow network in Keras

deep learning library in, 321–323

deep neural network in, 147–149

functional API, non-sequential architectures and, 251–256

implementing RNN, 242

intermediate-depth neural network in, 127–129

loading IMDb film reviews in, 225–226

parameter-adjustment in, 144

TensorBoard dashboard in, 152–154

weight initialization in, 132–135

Kernels (filters)

convolutional example of, 164–167

of convolutional layers, 160–162

number in convolutional layer, 162–163

size, convolutional filter hyperparameter, 167

Key concepts

artificial neural networks (ANNs), 110

artificial neurons that constitute ANNs, 97

deep reinforcement learning, 308–309

generative adversarial networks (GANs), 281–282

improving deep networks, 154–155

machine vision, 193

natural language processing (NLP), 256–257

training deep networks, 130

L1 vs. L2 regularization, reducing model overfitting, 141–142

Language. *See* Human and machine language

Language, computational representations of

localist vs. distributed representations, 32–33

one-hot representations of words, 25–26

overview of, 25

word2viz tool for exploring, 30–32

LASSO regression, reducing model overfitting, 141–142

Latent space

arithmetic on fake human faces in, 42–44

birth of generative adversarial networks, 40–41

Layers

deep learning model architectures, 51–52

Leaky ReLU activation function, 96

*Learn Python the Hard Way* (Shaw), 75

Learning rate

batch normalization allowing for higher, 139

building own project, 320

shortcomings of improving SGD with momentum, 146

as step size in gradient descent, 117–119

LeCun, Yan

on fabricating realistic images, 39

MNIST handwritten digits curated by, 76–78

Turing Award for deep learning, 15

Legg, Shane, 58

Lemmatization, as sophisticated alternative to stemming, 196

LeNet-5 model

Les 3 Brasseurs bar, 39

Levine, Sergey, 67

Libraries, deep learning, 321–324

Linear regression, object detection with R-CNN, 183–184

List comprehension

adding word stemming to, 201

removing stop words and punctuation, 200–201

`load()`

method, neural network model for DQN agent, 300

Loading data

coding shallow network in Keras, 79–81

`load_weights()`

method, loading model parameters, 232

Local minimum of cost, escaping, 122–124

Localist representations, distributed representations vs., 32–33

Long short-term memory (LSTM) cells

bidirectional (Bi-LSTMs), 247–248

implementing with Keras, 246–247

as layer of NLP, 53

Long-term memory, LSTM, 245–246

Lowercase

converting all characters in NLP to, 195–196, 199–200

processing full corpus, 204–206

LSTM. *See* Long short-term memory (LSTM) cells

LunarLander, OpenAI Gym environment, 70

Maaten, Laurens van der, 213–214

Machine art

arithmetic on fake human faces, 41–44

creating photorealistic images from text, 45–46

image processing using deep learning, 46–48

make your own sketches photorealistic, 45

overview of, 39

summary, 48

Machine language. *See* Human and machine language

Machine learning (ML). *See also* Traditional machine learning (ML) approach

overview of, 50

reinforcement learning problems of, 54–56

representation learning as branch of, 51

supervised learning problems of, 53–54

traditional machine vs. representation learning techniques, 22

unsupervised learning problems of, 54

Machine translation, NLP in, 23–24

Machine vision

AlexNet and VGGNet in Keras, 176–179

CNNs. *See* Convolutional neural networks (CNNs)

converting existing project, 316–317

datasets for deep learning image-classification models, 313–315

key concepts, 193

object recognition tasks, 52–53

Quick, Draw! game, 19

Software 2.0 and, 326

traditional machine learning approach, 12–13

Machine vision, applications of

capsule networks, 192

Fast R-CNN, 184

Mask R-CNN, 187

object detection, 183

overview of, 182

Magnetic resonance imaging (MRI), and visual cortex, 7–8

Manipulation of objects, 67–68

Markov decision process (MDP), 286–290

Mask R-CNN, image segmentation with, 186–187

Mass, Andrew, 203

matplotlib, weight initialization, 132

`max`

operation, pooling layers, 170

Max-pooling layers

AlexNet and VGGNet in Keras, 176–179

`MaxPooling2D`

dependency, LeNet-5 in Keras, 171–174

MCTS (Monte Carlo tree search) algorithm, 61, 66

MDP (Markov decision process), 286–290

Memory

batch size/stochastic gradient descent and, 119–122

DQN agent gameplay, 298

Software 2.0 and, 326

training DQN agent via replay of, 298–299

Metrics, SLM Lab performance, 305–306

Milestones, deep learning for NLP, 24–25

`min_count`

argument, word2vec, 210–211

Minibatches, splitting training data into, 119–122

ML. *See* Machine learning (ML)

MNIST handwritten digits

calculus for backpropagation, 337

coding shallow network in Keras, 76–78

computational complexity in dense networks, 160

Fashion-MNIST dataset deep learning project, 313–315

loading data for shallow net, 80–81

loss of two-dimensional imagery in dense networks, 159–160

reformatting data for shallow net, 81–83

schematic diagram of shallow network, 77–79

software dependencies for shallow net, 80

in stochastic gradient descent, 120

training deep networks with data augmentation, 145

Model generalization. *See* Overfitting, avoiding

Model optimization, agents beyond DQN using, 307

`ModelCheckpoint()`

object, dense sentiment classifier, 231–232

Modeling process, building own project, 318–321

Monte Carlo tree search (MCTS) algorithm, 61, 66

Morphemes, natural human language, 34

Morphology, natural human language, 34–35

`most_similar()`

method, word2vec, 212–213

Motion, detecting in visual cortex, 7–8

Mountain Car game, 316

MRI (magnetic resonance imaging), and visual cortex, 7–8

Müller, Vincent, 72

*Multi ConvNet Sentiment Classifier* Jupyter notebook, 320

MXNet, deep learning library, 324

*n*-dimensional spaces, 42–43, 339

Natural human language, elements of, 33–35

Natural language classification

dense network classifier architecture, 229–235

with familiar networks, 222

loading IMDb film reviews, 222–226

standardizing length of reviews, 228–229

*Natural Language Preprocessing* Jupyter notebook, 197

Natural language processing (NLP)

building own deep learning project, 315–316

building socially beneficial projects, 318

computational representations of. *See* Language, computational representations of

deep learning approaches to, 53

Google Duplex as deep-learning, 35–37

history of deep learning, 24–25

learning representations automatically, 22–23

natural human language elements of, 33–35

natural language classification in. *See* Natural language classification

networks designed for sequential data, 240–251

non-sequential architectures, 251–256

overview of, 195

preprocessing. *See* Preprocessing natural language data

Software 2.0 and, 326

summary, 256

transfer learning in, 251

word embedding with word2vec. *See* word2vec

`n_components`

, plotting word vectors, 214

Negative rewards, reinforcement learning problems and, 56

Negative sampling, training word2vec, 208

Neocognition

LeNet-5 advantages over, 13–14

Nesterov momentum optimizer, stochastic gradient descent, 146

Network architecture, regression model, 150–151

Network depth, as hyperparameter, 125–126

Neural Information Processing Systems (NIPS) conference, 41

Neural machine translation (NMT), seq2seq models, 250

Neural networks

building deep in PyTorch, 343–344

coding shallow in Keras, 83

Neuron saturation. *See* Saturated neurons

Neurons

AlexNet vs. LeNet-5, 17

behaviors of biological, 85–86

forming primary visual cortex, 4–7

regions processing visual stimuli in visual cortex, 7–8

TensorFlow Playground and, 17–19

tuning hidden-layer count and number of, 126

`next_state`

, DQN agent gameplay, 298

NIPS (Neural Information Processing Systems) conference, 41

`n_iter`

, plotting word vectors, 214

NLP. *See* Natural language processing (NLP)

NMT (neural machine translation), seq2seq models, 250

Noë, Alva, 39

Non-sequential model architecture, 251–256

Non-trainable params, model object, 109–110

Nonlinearity, of ReLU neuron, 95

Notation, formal neural network, 333–334

Number of epochs of training

as hyperparameter, 122

rule of thumb for learning rate, 119

stochastic gradient descent and, 119–122

training deep learning model, 83–84

NumPy

Object detection

with Fast R-CNN, 184

as machine vision application, 182–183

understanding, 183

Objective function (*π*), maximizing reward with, 290

Objects

recognition tasks of machine vision, 52–53

Occam’s razor, neuron count and, 126

On-device processing, machine learning for, 46–48

One-hot format

computational representations of language via, 25–26

converting integer labels to, 82–83

localist vs. distributed representations, 32–33

Online resources

building deep learning projects, 317–318

pretrained word vectors, 230

OpenAI Gym

building deep learning projects, 316

deep reinforcement learning, 68–70

interacting with environment, 300–303

Optimal policy

building neural network model for, 288–290

estimating optimal action via Q-learning, 290–292

Optimal Q-value (Q*), estimating, 291–292

Optimization

agents beyond DQN using, 306–307

fancy optimizers for stochastic gradient descent, 145–147

hyperparameter optimizers, 130, 303–306

minimizing cost via. *See* Cost, minimizing via optimization

stochastic gradient descent. *See* Stochastic gradient descent (SGD)

Output layer

artificial neural network with, 99

batch normalization and, 139

building network model for DQN agent, 298

calculus behind backpropagation, 335, 337

deep learning model architectures, 51–52

LSTM, 245

notation for neural networks, 334

schematic diagram of shallow network, 79

softmax layer for multiclass problems, 106–108

softmax layer of fast food-classifying network, 106–107

TensorFlow Playground demo, 100

Overfitting, avoiding

building your own project, 320

data augmentation, 145

Pac-Man

discount factor (decay) and, 288–289

DQN agent initialization and, 296

Padding

convolutional example of, 163–167

as convolutional filter hyperparameter, 167–168

standardizing length of IMDb film reviews, 228–229

Parameter initialization, building own project, 319

Parameters. *See also* Hyperparameters

Cart-Pole DQN agent initialization, 295–297

creating dense network classifier architecture, 230–232

escaping local minimum, 122–124

gradient descent minimizing cost across multiple, 116–117

pooling layers reducing overall, 169–170

saving model, 300

weight initialization, 132–135

Parametric ReLU activation function, 96

Partial-derivative calculus, cross-entropy cost, 114–115

Patches, in convolutional layers, 160

PCA (principal component analysis), 213

Perceptrons

choosing, 96

hot dog/not hot dog detector example, 86–90

modern neurons vs., 91

as most important equation in this book, 90–91

overview of, 86

Performance

hyperparameter optimization using SLM Lab, 303–306

Software 2.0 and, 326

PG. *See* Policy gradient (PG) algorithm

Phonemes, natural human language and, 34

Phonology, natural human language and, 34–35

Photorealistic images, creating. *See* Machine art

`Phraser()`

method, NLP, 202–203, 204–205

`Phrases()`

method, NLP, 202–203, 204–205

pix2pix web application, 45–46

Pixels

computational complexity and, 160

converting integers to floats, 82

convolutional example of, 163–167

convolutional layers and, 160–162

handwritten MNIST digits as, 77–78

kernel size hyperparameter of convolutional filters, 167

reformatting data for shallow net, 81–83

schematic diagram of shallow network, 78–79

two-dimensional imagery and, 159–160

Plotting

GAN training accuracy, 281

Policy function (*π*), discounted future reward, 288–290

Policy gradient (PG) algorithm

actor-critic using Q-learning with, 307–308

in deep reinforcement learning, 68

REINFORCE algorithm as, 307

Policy networks, AlphaGo, 61

Policy optimization

agents beyond DQN using, 307

building neural network model for, 288–290

estimating optimal action via Q-learning, 290–292

RL agent using actor-critic with, 307–308

Positive rewards, deep reinforcement learning, 56, 57

Prediction

selecting action for DQN agent, 300

training dense sentiment classifier, 232

training DQN agent via memory replay, 299

word2vec using predictive models, 208

Preprocessing natural language data

converting all characters to lowercase, 199–200

removing stop words and punctuation, 200–201

stemming, 201

Principal component analysis (PCA), 213

Probability distribution, Markov decision processes, 288

Processing power, AlexNet vs. LeNet-5, 16–17

Project Gutenberg. *See* Preprocessing natural language data

Punctuation

processing full corpus, 204–206

Python, for example code in this book, 75–76

PyTorch

building deep neural network in, 343–344

deep learning library, 323–324

installation, 341

Q-learning networks

actor-critic combining PG algorithms with, 307–308

DQNs. *See* Deep Q-learning networks (DQNs)

Q-value functions

agents beyond DQN optimizing, 306

drawbacks of DQN agents, 306

training DQN agent via memory replay, 299

Quake III Arena, DeepMind Lab built on, 69

Quick, Draw! game

for hundreds of machine-drawn sketches, 48

introduction to deep learning, 19

R-CNN

Fast R-CNN, 184

object detection application, 183–184

RAM (memory), batch size/stochastic gradient descent and, 119–122

`rand`

function, DQN agent action selection, 299–300

`randrange`

function, DQN agent action selection, 300

Rectified linear unit neurons. *See* ReLU (rectified linear unit) neurons

Recurrent neural networks (RNNs)

LSTM cell as layer of NLP in, 53

stacked recurrent models, 248–250

Reformatting data, coding shallow network, 81–83

Regions of interest (ROIs)

developing Faster R-CNN, 184–185

image segmentation with Mask R-CNN, 187

object detection with Fast R-CNN, 184

object detection with R-CNN, 183–184

Regression, improving deep networks, 149–152

REINFORCE algorithm, agents beyond DQN using, 307

Reinforcement learning

building socially beneficial projects, 318

overview of, 49

problems of machine learning, 54–56

as sequential decision-making problems, 284

*Reinforcement Learning: An Introduction* (Barto), 292

Reinforcement learning, deep

board games. *See* Board games

building own project. *See* Deep learning projects, building own

essential theory of deep Q-learning networks, 290–292

essential theory of reinforcement learning, 283–286

game-playing applications. *See* Game-playing machines

hyperparameter optimization with SLM Lab, 303–306

interacting with OpenAI Gym environment, 300–303

manipulation of objects, 67–68

Markov decision processes, 286–288

popular learning environments for, 68–71

summary, 308

ReLU (rectified linear unit) neurons

with Glorot distributions, 136–137

neural network model for DQN agent, 297

as preferred neuron type, 96

TensorFlow Playground demo, 100

Representation learning, 22, 51

`requires_grad`

argument, PyTorch, 342

Residual networks (ResNets), 180–182

Resources, building deep learning projects, 317–318

`return_sequencesTrue`

, stacking recurrent layers, 248

Reward(s)

deep Q-learning network theory, 290–292

DeepMind DQN and, 59

DQN agent gameplay, 298

Markov decision processes (MDPs), 287–289

reinforcement learning problems and, 56

theory of reinforcement learning, 283

training DQN agent via memory replay, 298–299

Ridge regression, reducing model overfitting, 141–142

RMSProp, 147

RMSProp optimizer, 147

ROC AUC metric

as area under ROC curve, 217–218

for sentiment classifier model architectures, 256

ROIs. *See* Regions of interest (ROIs)

Round of training, stochastic gradient descent, 120–121

Running time, Software 2.0 and, 325

Sabour, Sara, 192

Saturated neurons

as flaw in calculating quadratic cost, 112–113

minimizing impact using cross-entropy cost, 113–115

reducing with cross-entropy cost and weight initialization, 131–135

weight initialization, Glorot normal distribution, 136

Saving model parameters, 300

Schematic diagram

activation values in feature map of convolutional layer, 164

coding shallow network in Keras, 77–79

of discriminator network, 268

of generator network, 270

of LSTM, 245

of recurrent neural network, 241

wide and deep modeling, 317

Schmidhuber, Jürgen, 244

Search, automating hyperparameter, 321

Sedol, Lee, 62

See-in-the-Dark dataset, image processing, 47–48

Semantics, natural human language and, 34–35

`sentences`

argument, word2vec, 210

Sentiment classifier

LSTM architecture, 247

non-sequential architecture example, 251–255

performance of model architectures, 256

seq2seq (sequence-to-sequence), and attention, 250

Sequential decision-making problems, 284

Sequential model, building for DQN agent, 297–298

`sg`

argument, word2vec, 210

SG (skip-gram) architecture, 207, 208

SGD. *See* Stochastic gradient descent (SGD)

Shadow Dexterous Hand, OpenAI Gym, 70

Shallow network

coding. *See* Coding shallow network in Keras

intermediate-depth neural network in, 127–129

Short-term memory, LSTM, 245–246

*Sigmoid Function* Jupyter notebook, 105

Sigmoid neuron(s)

for binary classification problems, 100–101, 105–106

choosing, 96

for shallow net in Keras, 79, 83

softmax function with single neuron equivalent to using, 108

weight initialization and, 133–137

Similarity score, running word2vec, 212–213

Simple neurons

forming primary visual cortex, 6–7

`SimpleRNN()`

layer, RNN sentiment classifier, 243

`size`

argument, word2vec, 210

Skip-gram (SG) architecture, 207, 208

Socially beneficial projects, deep learning projects, 318

Softmax layer, fast food-classifying network, 106–108

Softmax probability output, Fast R-CNN, 184

Software dependencies, shallow net in Keras, 80

Sofware 2.0, deep learning models, 324–326

Speech recognition, NLP in, 24

Spell-checkers, 24

Squared error, as quadratic cost, 112

Stacked recurrent models, 248–250

StackGAN, photorealistic images from text, 45–46

State(s)

deep Q-learning network theory and, 290–292

DeepMind DQN and, 58

DQN agent, remembering gameplay, 298

Markov decision processes and, 286

optimal policy in deep reinforcement learning and, 289–290

reinforcement learning problems and, 56

reinforcement learning via Cart-Pole game and, 286

theory of reinforcement learning, 284

Static scatterplot, plotting word vectors, 214–216

Stemming, word

overview of, 201

preprocessing natural language via, 196

Stochastic gradient descent (SGD)

escaping local minimum of cost via, 122–124

training deep networks using batch size and, 119–124

Stop words

how to remove, 200

Stride length

as convolutional filter hyperparameter, 167

reducing computational complexity, 170

Suleyman, Mustafa, 58

Supervised learning problems, machine learning, 53–54

Support vector machines, R-CNN, 183–184

Sutton, Richard, 292

Tanh neurons

activation function of, 94

choosing, 96

with Glorot distributions, 136–137

Target word

converting natural words to word vectors, 27–28

Tensor processing units (TPUs), Google training neural networks, 64

TensorBoard dashboard, 152–154

TensorFlow Playground, 17–19, 100

Tensors, PyTorch

automatic differentiation in, 342–343

building deep neural network, 343–344

compatibility with NumPy operations, 324

Terminal state, theory of reinforcement learning, 284

Text, creating photorealistic images from, 45–46

Text-to-speech (TTS) engine, Google Duplex, 36–37

Theano, deep learning library, 324

Theory, essential

of deep Q-learning networks, 290–292

of reinforcement learning, 283–284

Threshold value, perceptron equation, 89–91

Tokenization

natural human language and, 35–36

preprocessing natural language, 195, 197–199

Torch, PyTorch as extension of, 323–324

`torch.nn.NLLLoss()`

function, PyTorch, 344

TPUs (tensor processing units), Google training neural networks, 64

Traditional machine learning (ML) approach

deep learning approach vs., 11–12

entrants into ILSVRC using, 14–15

natural human language in, 33–35

one-hot encoding of words in, 25–26

`train()`

method

training DQN agent, 299

Training

AlphaGo vs. AlphaGo Zero, 63–65

Training deep networks

batch size and stochastic gradient descent, 119–122

coding shallow network in Keras, 83–84

convolutional sentiment classifier, 238

data augmentation for, 145

deep neural network in Keras, 147–149

dense sentiment classifier, 232

escaping local minimum, 122–124

generative adversarial networks (GANs), 259–262, 275–281

intermediate-depth neural network, 128–129

intermediate net in Keras, 127–129

key concepts, 130

minimizing cost via optimization, 115

overview of, 111

preventing overfitting with dropout, 142–145

recurrent neural networks (RNNs), 241

running word2vec, 208

transfer learning model of, 188–192

tuning hidden-layer and neuron counts, 125–126

via memory replay for DQN agent, 298–299

Transfer learning

natural language and, 230

in NLP, 251

Truncation, standardizing film review length, 228–229

`TSNE()`

method, plotting word vectors, 214–216

TTS (text-to-speech) engine, Google Duplex, 36–37

Two-dimensional images, flattening to one dimension, 82

Two-dimensional structure of visual imagery

retaining in convolutional layers, 167

retaining using LeNet-5 in Keras, 172

U-Net, image segmentation, 187–188

ULMFiT (universal language model fine-tuning), transfer learning, 251

United States Postal Service, LeNet-5 reading ZIP codes, 11

Unity ML-Agents plug-in, 71, 304

Unstable gradients, improving deep networks, 137–139

Unsupervised learning problems, machine learning, 54

Value functions, Q-learning, 291–292

Value networks, AlphaGo algorithm, 61

Value optimization

agents beyond DQN using, 306

RL agent using actor-critic algorithm and, 307–308

Vanishing gradient problem

in artificial neural networks, 137–138

performance degradation in deep CNNs, 179–180

Vector space

embeddings. *See* Word vectors

latent space similarities to, 42–43

word meaning represented by three dimensions, 27–29

Visual imagery, two-dimensional structure of, 159–160

Visual perception

WaveNet, Google Duplex TTS engine, 36–37

Weight initialization, 131–137

Weighted sum, perceptron algorithm, 86–89

Weight(s)

backpropagation and, 125, 335–337

convolutional example of, 163–167

of kernels in convolutional layers, 160–162

minimizing cost via gradient descent, 115–116

notation for neural networks, 334

Wide and deep modeling approach, Google, 317

Wiesel, Torsten

LeNet-5 model built on work of, 10–12

machine vision using work of, 8–9

research on visual cortex, 4–7

`window`

argument, word2vec, 210

Wittgenstein, Ludwig, 21

Word embeddings. *See* Word vectors

Word vectors. *See also* word2vec

capturing word meaning, 195

computational representations. *See* Language, computational representations of

convolutional filters detecting triplets of, 239

evaluating, 209

localist vs. distributed representations, 32–33

in NLP. *See* Natural language processing (NLP)

online pretrained, 230

training on natural language data, 229–230

word2viz tool for exploring, 30–32

word2vec

converting natural words to word vectors, 28

essential theory behind, 206–209

evaluating word vectors, 209

FastText as leading alternative to, 209

plotting word vectors, 213–217

word embeddings, 206

Words

creating embeddings with word2vec. *See* word2vec

natural human language and, 33–35

preprocessing natural language. *See* Preprocessing natural language data

`word_tokenize()`

method, natural language, 199

`workers`

argument, word2vec, 211

Xavier Glorot distributions, improving deep networks, 135–137

Yelp review polarity, 316

Zhang, Xiang, 315

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.