Index

A

act() method, defining DQN agent, 299300, 301

Action potential, of biological neurons, 8586

Action(s)

deep Q-learning network theory, 290292

DeepMind DQN and, 59

DQN agent, 298300

Markov decision processes and, 286

reinforcement learning problems and, 5556

Activation functions

calculus behind backpropagation, 335336

choosing neuron type, 96

convolutional example, 164166

Glorot distributions, 136137

nonlinear nature of in deep learning architectures, 95

ReLU neuron, 9495

saturated neurons, 112113

sigmoid neuron example, 9294

softmax layer of fast food-classifying network, 106108

tanh neuron, 94

Activation maps

convolutional networks and, 238

in discriminator network, 267268

Faster R-CNN, 184185

in generator network, 269, 272

LeNet-5 ConvNet architecture, 173175

as output from convolutional kernels, 163167

with padding, 168169

pooling layers spatially reducing, 169170

U-Net, 187188

Actor-critic algorithm, RL agent, 307308

AdaDelta optimizer, 146147

AdaGrad optimizer, 146

Adaptive moment estimation (Adam) optimizer, 147, 148149

Adversarial network, GANs, 272274

Agent(s)

beyond DQN, 306308

deep Q-learning network theory, 290292

deep reinforcement learning and, 57

DeepMind DQN, 58

DQN. See DQN agents

optimal policy in deep reinforcement learning, 289290

reinforcement learning problems of machine learning, 5456

reinforcement learning theory, 283

SLM Lab, 304

AGI (artificial general intelligence), 72, 326328

AI. See Artificial intelligence (AI)

AlexNet

CNN model inspired by, 176177

history of deep learning for NLP, 25

overview of, 1417

ReLU neurons in, 95

Algorithms, development of AGI and, 327

AlphaZero, 5354, 6566

AlphaGo, 5962, 6364

AlphaGo Master, 64

AlphaGo Zero, 6265

Amazon review polarity, NLP training/validation samples, 316

ANI (artificial narrow intelligence), 72, 326327

Architecture

adversarial model, 273274

AlexNet hierarchical, 16

bidirectional LSTM sentiment classifier, 248

convolutional sentiment classifier, 237238

deep learning model, 52

deep net in Keras model, 148149

dense sentiment classifier, 229231

discriminator model, 266269

generalist neural network as single network, 52

generator model, 270272

intermediate-depth neural network, 127128

Keras functional API, 251256

LeNet-5 hierarchical, 911, 172176

LSTM, 247

multi-ConvNet sentiment classifier, 253254

regression model network, 150151

residual network, 182

RNN sentiment classifier, 243244

shallow neural network, 7879, 83

stacked recurrent model, 249

TensorFlow Playground, 17

U-Net, 47, 187

weight initialization, 133135

word2vec, 207213

Arithmetic

on fake human faces, 4144

word-vector, 2930

Art. See Machine art

Artificial general intelligence (AGI), 72, 326328

Artificial intelligence (AI)

categories of, 7172

deep learning for NLP relevant to, 53

driven by deep learning, 52

general-purpose learning algorithms for, 58

history of chess and, 65

machine learning as subset of, 50

OpenAI Gym environments as, 6870

overview of, 4950

Artificial narrow intelligence (ANI), 72, 326327

Artificial neural networks (ANNs). See also Artificial neurons, constituting ANNs

AlphaGo Zero development, 63

AlphaZero development, 6566

architecture for shallow networks, 8384

birth of GANs via, 4041

building model for DQN agent, 297298

deep reinforcement learning using, 5657

dense layers, 99100

dominating representation learning, 51

hot dog-detecting dense network, 101106

input layer, 99

key concepts, 110

manipulation of objects via, 6768

schematic diagram of Jupyter network, 7779

shallow networks and, 108110

softmax layer of fast food-classifying network, 106108

summary, 110

Artificial neurons

deep learning and, 22

deep learning model architectures, 5152

Artificial neurons, constituting ANNs

biological neuroanatomy, 8586

choosing neuron type, 96

hot dog/not hot dog detector, 8690

key concepts, 97

modern neurons/activation functions, 9195

most important equation in this book, 9091

overview of, 85

perceptrons as early, 86

ReLU neuron, 9495

sigmoid neuron, 9294

summary, 96

tanh neuron, 94

Artificial super intelligence (ASI), 72, 327

astype() method, LeNet-5 in Keras, 172

Atari Games, DeepMind, 5859

Attention, seq2seq and, 250

Automatic differentiation, PyTorch, 342343

B

Backpropagation

of bidirectional LSTMs, 247

cross-entropy costs and, 114

enabling neural networks to learn, 113

LeNet-5 model, 1012

minimizing cost, 115

overview of, 124125

partial-derivative calculus behind, 335337

training recurrent neural networks, 241

tuning hidden-layer and neuron counts, 125126

BAIR (Berkeley Artificial Intelligence Research) Lab, 4445

Batch normalization

deep neural networks in Keras, 148

improving deep networks, 138139

network architecture regression model, 150151

Batch size

of 1, also known as online learning, 124

building own project and, 320

escaping local minimum of cost, 122124

as hyperparameter, 119

and stochastic gradient descent, 119122

Bazinska, Julia, 3031

Behavioral cloning, 307

Benchmarking performance, SLM lab, 304

Bengio, Yoshua

LeNet-5 model, 912

Turing Award for deep learning, 15

weight initialization and Glorot normal distribution, 135137

Berkeley Artificial Intelligence Research (BAIR) Lab, 4445

BERT (bi-directional encoder representations from transformers), NLP, 251

beta (β) hyperparameter

batch normalization adding, 139

optimizing SGD, 145147

Bi-directional encoder representations from transformers (BERT), NLP, 251

bias (b)

adding to convolutional layers, 162

in convolutional example, 164

minimizing cost via gradient descent, 115116

notation for neural networks, 333

in perceptron equation, 9091

Bidirectional LSTMs (Bi-LSTMs), 247249

Bigram collocation, 202206

Binary-only restriction, of perceptrons, 9192

Biological neurons

anatomy of, 8586

creating perceptron algorithm with, 86

ReLU neuron activation function, 9495

Biological vision, 8, 20

Board games

AlphaZero and, 6566

AlphaGo and, 5962

AlphaGo Zero and, 6265

overview of, 59

boston_housing dataset, 149150

Bostrom, Nick, 72

Bounding boxes, developing YOLO, 185186

build_discriminator function, 266268

_build_model() method, DQN agent, 296298

Burges, Chris, MNIST dataset, 7778

C

Caffe, deep learning library, 324

Calculus, in backpropagation, 335337

callbacks argument

dense sentiment classifier, 232

TensorBoard, 152, 154

Cambrian explosion, 3

Capsule networks, machine vision and, 192

Cart-Pole game

defining DQN agent for. See DQN agents

DQN agent interaction with OpenAI Gym, 300303

estimating optimal Q-value, 292

hyperparameter optimization using SLM Lab, 304

Markov decision processes in, 288289

as reinforcement learning problem, 284286

CartPole, OpenAI Gym environment, 70

CBOW (continuous bag of words), word2vec, 207, 208

Cell body, biological neurons, 8586

Cell state, LSTM, 244245

Cerebral cortex, processing visual information, 34

cGAN (conditional GAN), 45

Chain rule of calculus, backpropagation and, 124

Chatbots, natural language processing in, 2324

Checkpoints, dense sentiment classifier, 231

Chen, Chen, deep learning image processing, 4748

Chess

AlphaZero and, 6566

vs. Go board complexity, 59, 61

Classification

adding layers to transfer learning model, 190

convolutional sentiment classifier, 235239

of film reviews by sentiment, 229235

natural language. See Natural language classification

as supervised learning problem, 5354

CNNs. See Convolutional neural networks (CNNs)

CNTK, deep learning library, 324

Coding shallow network in Keras

designing neural network architecture, 83

installation, 76

loading MNIST data, 8081

MNIST handwritten digits, 7677

overview of, 75, 76

prerequisites, 7576

reformatting data, 8183

schematic diagram of network, 7779

software dependencies for shallow net, 80

summary, 84

training deep learning model, 8384

Color, visual cortex detects, 78

Compiling

adversarial network, 274

dense sentiment classifier, 231

discriminator network, 269

network model for DQN agent, 298

Complex neurons

forming primary visual cortex, 67

neocognition and, 9

Computational complexity

minimizing number of kernels to avoid, 163

from piping images into dense networks, 160

Computational homogeneity, with Software 2.0, 325

Computing power, AGI and development of, 327

Conditional GAN (cGAN), 4546

Conditional imitation learning algorithms, 307

Confusion matrix, 218219

Content generation, building socially beneficial projects, 318

Context words, running word2vec, 207209

Continuous bag of words (CBOW), word2vec, 207, 208

Continuous variable, supervised learning problem, 54

Contracting path, U-Net, 187188

Conv2D dependency, LeNet-5 in Keras, 171174

Convolutional filter hyperparameters, CNNs, 168169

Convolutional layers

convolutional neural networks (CNNs) and, 160162

general approach to CCN design, 176

working with pooling layers, 169170

Convolutional layers, GANs

birth of GANs, 41

convolutional neural networks (CNNs) and, 5253

multiple filters in, 162163

results of latent space arithmetic, 4244

Convolutional neural networks (CNNs)

computational complexity, 160

contemporary machine vision and, 5253

convolutional filter hyperparameters, 168169

convolutional layers, 160162

DeepMind DQN using, 58

detecting spatial patterns among words, 235239

developing Faster R-CNN, 184185

developing YOLO, 185186

example of, 163167

general approach to CCN design, 176

image segmentation with Mask R-CNN, 187

LeNet-5 in Keras, 171176

manipulation of objects via, 6768

model inspired by AlexNet, 176178

model inspired by VGGNet, 178179

multiple filters, 162163

object detection with Fast R-CNN, 184

object detection with R-CNN, 183184

overview of, 159

transfer learning model of, 188192

two-dimensional structure of visual imagery, 159160

Convolutional sentiment classifier, 235239, 252256

convTranspose layers, in generator networks, 270

Corpus

one-hot encoding of words within, 2526

preprocessing full, 203206

word vectors within, 2729

word2vec architectures for, 208

Cortes, Corinna, curating MNIST dataset, 7778

Cost (loss) functions

building own project, 319

cross-entropy cost, 113115

quadratic cost, 112113

in stochastic gradient descent, 120

training deep networks and, 111

using backpropagation to calculate gradient of, 124125, 335337

Cost, minimizing via optimization

batch size and stochastic gradient descent, 119122

escaping local minimum, 122124

gradient descent, 115117

learning rate, 117119

training deep networks and, 115

Count based, word2vec as, 208

Cross-entropy cost

essential GAN theory, 262

minimizes impact of neuron saturation, 113115, 131

pairing with weight initialization, 131135

CycleGANs, style transfer of well-known painters, 4445

D

Dahl, George, 2425

Data

augmentation, training deep networks, 145

development of AGI and, 327

Data generators, training, 190191

DataFrame, IMDb validation data, 234

Datasets, deep reinforcement learning using larger, 57

De-convolutional layers, generator networks, 269270, 272

deCNN, generator network as, 270

Deep Blue, history of chess, 65

Deep learning

code. See Coding shallow network in Keras

computational representations of language. See Language, computational representations of

definition of, 22

elements of natural human language in, 3335

Google Duplex as NLP based on, 3537

model architectures, 5152

natural language processing and, 2325, 37

networks learn representations automatically, 2223

reinforcement learning combined with. See Reinforcement learning, deep

training deep networks. See Training deep networks

Deep learning, introduction

biological vision, 38

machine vision. See Machine vision

Quick, Draw! game, 19

summary, 20

TensorFlow Playground, 1719

traditional machine learning vs., 1112

Deep learning projects, building own

artificial general intelligence approach, 326328

converting existing machine learning project, 316317

deep learning libraries, 321324

deep reinforcement learning, 316

machine vision and GANs, 313315

modeling process, including hyperparameter tuning, 318321

natural language processing, 315316

overview of, 313

resources for further projects, 317318

Software 2.0, 324326

summary, 328329

Deep networks, improving

deep neural network in Keras, 147149

fancy optimizers, 145147

key concepts, 154155

model generalization (avoiding overfitting), 140145

overview of, 131

regression, 149152

summary, 154

TensorBoard, 152154

unstable gradients, 137139

weight initialization, 131135

Xavier Glorot distributions, 135137

Deep Q-learning networks (DQNs)

DeepMind video game and, 5860

defining DQN agent. See DQN agents

essential theory of, 290292

SLM Lab using, 304306

Deep reinforcement learning. See Reinforcement learning, deep

Deep RL agents, 306308

DeepMind

AlphaGo board game, 6162

AlphaGo Zero board game, 6265

Google acquiring, 59

video games, 5860

DeepMind Lab

building own deep learning project with, 316

deep reinforcement learning, 69, 71

Dendrites, and biological neurons, 8586

Denormalization, in batch normalization, 139

Dense layers

architecting intermediate net in Keras, 127128

artificial neural networks with, 99100

CNN model inspired by AlexNet, 177178

computational complexity and, 160

convolutional layers vs., 168

deep learning and, 51

Fast R-CNN and, 184

in GANs, 271272

general approach to CCN design, 176

LeNet-5 in Keras and, 172173, 175176

multi-ConvNet model architecture, 253255

in natural language processing, 224225, 230231, 236238

networks designed for sequential data, 243

in shallow networks, 109

using weight initialization for deep networks, 132133, 137

in wide and need model architecture, 317

Dense network

architecture, 229235

building socially beneficial projects, 318

defined, 100

hot dog-detecting, 101106

revisiting shallow network, 108110

softmax layer of fast food-classifying network, 106108

Dense sentiment classifier, 229235

Dense Sentiment Classifier Jupyter notebook. See Natural language classification

Dependencies

Cart-Pole DQN agent, 293

convolutional sentiment classifier, 236

LeNet-5 in Keras, 171

loading GAN for Quick, Draw! game, 264265

loading IMDb film reviews, 222223

preprocessing natural language, 197

regression model, 150

TensorFlow with Keras layers, 323

Dimensionality reduction, plotting word vectors, 213217

Discount factor (decay), Markov decision processes, 288289

Discounted future reward

expected, 290

maximizing, 290

Discriminator network, GANs

code for training, 277

defined, 4041

overview of, 266269

training, 259262

Distributed representations, localist representations vs., 32

Dot product notation, perceptron equation, 9091

DQN agents

agents beyond, 306308

building neural network model for, 297298

drawbacks of, 306

hyperparameter optimization using SLM Lab, 304

initialization parameters, 295297

interacting with OpenAI Gym environment, 300303

overview of, 293295

remembering gameplay, 298

selecting action to take, 299300

training via memory replay, 298299

DQNs. See Deep Q-learning networks (DQNs)

Dropout

for AlexNet in Keras, 177

for deep neural network in Keras, 148

for LeNet-5 in Keras, 171174

network architecture regression model and, 150151

preventing overfitting with, 142145

E

Eager mode, TensorFlow, 322323

Ease of use, Software 2.0, 326

Efros, Alexei, 44

ELMo (embeddings from language models), transfer learning, 251

Elo scores

AlphaGo game, 6263

AlphaGo Zero game, 6465

AlphaZero game, 66

Encoder-decoder structure, NMT, 250

Environment(s)

DeepMind DQN, 58

OpenAI Gym, 300303

popular deep reinforcement learning, 68

reinforcement learning problems of machine learning, 5456

reinforcement learning theory, 283

training agents simultaneously via SLM Lab in multiple, 304

Epochs of training, checkpointing model parameters after, 231232

Essential theory. See Theory, essential

exp function, softmax layer of fast food-classifying network, 106108

Expanding path, U-Net, 187188

Experiment graph, SLM Lab, 304

Expertise, subject-matter

AutoNet reducing requirement for, 17

deep learning easing requirement for, 2223

Exploding gradients, ANNs, 138

Extrinsic evaluations, evaluating word vectors, 209

F

Face detection

arithmetic on fake human faces, 4144

birth of generative adversarial networks, 3941

engineered features for robust real-time, 1213

in visual cortex, 8

Facebook, fastText library, 33

False negative, IMDb reviews, 236

False positive, IMDb reviews, 235

Fan Hui, AlphaGo match, 62

Fancy optimizers, deep network improvement, 145147

Fashion-MNIST dataset, deep learning project, 313315

Fast food-classifying network, softmax layer of, 106108

Fast R-CNN, object detection, 184

Faster R-CNN, object detection, 184185

FastText, 33, 209

Feature engineering

AlexNet automatic vs. expertise-driven, 17

defined, 11

traditional machine learning and, 1213

traditional machine learning vs. deep learning, 1112

Feature maps

convolutional example of, 163167

image segmentation with U-Net, 188

transfer learning model and, 188192

Feedforward neural networks, training, 241

FetchPickAndPlace, OpenAI Gym, 70

Figure Eight

image-classification model, 315

natural language processing model, 316

Filters. See Kernels (filters)

Finn, Chelsea, 67

fit_generator() method, transfer learning, 191192

Fitting, dense sentiment classifier, 232

Flatten layer, LeNet-5 in Keras, 171174

FloatTensor, PyTorch, 339

for loop, GAN training, 275281

Formal notation, neural networks, 333334

Forward propagation

backpropagation vs., 124

defined, 103

in hot dog-detecting dense network, 101106

notation for neural networks, 334

in softmax layer of fast food-classifying network, 106108

in stochastic gradient descent, 120, 121

Frozen Lake game, 316

Fukushima, Kunihiko, LeNet-5, 912

Fully connected layer (as dense layer), 99

Functional API, non-sequential architectures and Keras, 251256

Fusiform face area, detecting in visual cortex, 8

G

Game-playing machines

artificial intelligence, 4950

artificial neural networks (ANNs), 51

board games, 5966

categories of AI, 7172

categories of machine learning problems, 5356

deep learning, 5152

deep reinforcement learning, 5657

machine learning, 50

machine vision, 5253

manipulation of objects, 6768

natural language processing, 53

overview of, 4950

popular deep reinforcement learning environments, 6871

representation learning, 51

Software 2.0 and, 326

summary, 72

video games, 5759

Gameplay, 298300

gamma (γ), batch normalization adding, 139

GANs. See Generative adversarial networks (GANs)

Gated recurrent units (GRUs), 249250

gberg_sents, tokenizing natural language, 199

Generative adversarial networks (GANs)

actor-critic algorithm reminiscent of, 308

adversarial network component, 272274

arithmetic on fake human faces, 4144

birth of, 3941

building and tuning own, 315

creating photorealistic images from text, 4546

discriminator network component, 266269

essential theory, 259262

generator network component, 269272

high-level concepts behind, 39

image processing using deep learning, 4748

key concepts, 281282

making photos photorealistic, 45

Quick, Draw! game dataset, 263266

reducing computational complexity with, 170

Software 2.0 and, 326

style transfer, 4445

summary, 281

training, 275281

Generator network, GANs

code for training, 277278

defined, 4041

overview of, 269272

training, 259262

Geoff Hinton, 9495

Girshick, Ross, 183184

GitHub repository, Quick, Draw! game dataset, 263

Global minimum of cost, training deep networks for, 122124

Glorot normal distribution, improving deep networks, 135137

GloVe

converting natural words to word vectors, 28

as major alternative to word2vec, 208

Go board game, 5966

Goodfellow, Ian

arithmetic on fake human faces and, 4144

birth of GANs, 3941

MNIST dataset used by, 7677

Google Duplex technology, deep-learning-based NLP, 3537

GPUs (graphics processing units), deep reinforcement learning, 57

Gradient descent

batch size and stochastic, 119122

cross-entropy costs and, 114

enabling neural networks to learn, 113

escaping local minimum using, 122124

learning rate in, 117119

minimizing cost with, 115117

training deep networks with batch size/stochastic, 119122

Graesser, Laura, 304

Graphics processing units (GPUs), deep reinforcement learning, 57

GRUs (gated recurrent units), 249250

Gutenberg, Johannes, 197

H

HandManipulateBlock, OpenAI Gym, 70

Handwritten digits, MNIST, 7678

Hassabis, Demis, 5859

Hidden layers

artificial neural network with, 99

building network model for DQN agent, 297

calculus behind backpropagation, 337

deep learning model architectures, 5152

dense layers within. See Dense layers

forward propagation in dense network through, 102106

hot dog-detecting dense network, 101106

neural network notation, 333334

schematic diagram of shallow network, 79

TensorFlow Playground demo, 100

tuning neuron count and number of, 125126

Hidden state, LSTM, 245

Hierarchical softmax, training word2vec, 208

Hinton, Geoffrey

developing capsule networks, 192

developing t-distributed stochastic neighbor embedding, 213214

as godfather of deep learning, 1415

Histogram of validation data

convolutional sentiment classifier, 239

dense sentiment classifier, 233234

Hochreiter, Sepp, 244

Hot dog-detecting dense network, 101106

Hot dog/not hot dog detector, perceptrons, 8690

Hubel, David

LeNet-5 model built on work of, 1012

machine vision approach using work of, 89

research on visual cortex, 47

Human and machine language. See also Language, computational representations of

deep learning for natural language processing, 2125

elements of natural human language in, 3335

Google Duplex technology, 3537

summary, 37

Humanoid, OpenAI Gym environment, 70

Hyperparameters. See also Parameters

in artificial neural networks, 130

automating search for, 321

batch size, 119

Cart-Pole DQN agent, 293295

convolutional filter, 163, 167

convolutional sentiment classifier, 236237

learning rate, 118

for loading IMDb film reviews, 223225

LSTM, 246247

multi-ConvNet sentiment classifier, 253

network depth, 125126

number of epochs of training, 122

optimizing with SLM Lab, 303306

reducing model overfitting with dropout, 144145

RMSProp and AdaDelta, 147

RNN sentiment classifier, 242243

tuning own project, 318321

understanding in this book, 118119

I

IMDb (Internet Movie Database) film reviews. See Natural language classification

ILSVRC (ImageNet Large Scale Visual Recognition Challenge)

AlexNet and, 1417

ResNet, 182

traditional ML vs. deep learning entrants, 1314

Image classification

building socially beneficial projects using, 318

ILSVRC competition for, 182

machine vision datasets for deep learning, 313315

object detection vs., 183

Image segmentation applications, machine vision, 186188

ImageDataGenerator class, transfer learning, 190191

ImageNet, and ILSVRC, 1314

Images

creating photorealistic. See Machine art

processing using deep learning, 4648

Imitation learning, agents beyond DQN optimizing, 307

Infrastructure, rapid advances in, 327

Initialization parameters, DQN agent, 295297

Input layer

artificial neural networks with, 99

of deep learning model architectures, 5152

hot dog-detecting dense network, 101106

LSTM, 245

notation for neural networks, 333

of perceptrons, 8688

schematic diagram of shallow network, 79

TensorFlow Playground demo, 100

Installation

of code notebooks, 76

PyTorch, 341

Integer labels, converting to one-hot, 8283

Intermediate Net in Keras Jupyter notebook, 127129

Internal covariate shift, batch normalization, 138139

Internet Movie Database (IMDb) film reviews. See Natural language classification

Intrinsic evaluations, word vectors, 209

iter argument, running word2vec, 210

J

Jones, Michael, real-time face detection, 1213

K

Kaggle

image-classification model, 315

natural language processing model, 316

Karpathy, Andrej, 324326

Kasparov, Garry, 65

Keng, Wah Loon, 304

Keras

AlexNet and VGGNet in, 176179

coding in. See Coding shallow network in Keras

deep learning library in, 321323

deep neural network in, 147149

functional API, non-sequential architectures and, 251256

implementing LSTM, 246247

implementing RNN, 242

intermediate-depth neural network in, 127129

LeNet-5 model in, 171176

loading IMDb film reviews in, 225226

parameter-adjustment in, 144

TensorBoard dashboard in, 152154

transfer learning in, 188192

weight initialization in, 132135

Kernels (filters)

convolutional example of, 164167

of convolutional layers, 160162

number in convolutional layer, 162163

pooling layers using, 169170

size, convolutional filter hyperparameter, 167

Key concepts

artificial neural networks (ANNs), 110

artificial neurons that constitute ANNs, 97

deep reinforcement learning, 308309

generative adversarial networks (GANs), 281282

improving deep networks, 154155

machine vision, 193

natural language processing (NLP), 256257

training deep networks, 130

Krizhevsky, Alex, 14, 16

L

L1 vs. L2 regularization, reducing model overfitting, 141142

Language. See Human and machine language

Language, computational representations of

localist vs. distributed representations, 3233

one-hot representations of words, 2526

overview of, 25

word vector-arithmetic, 2930

word vectors, 2629

word2viz tool for exploring, 3032

LASSO regression, reducing model overfitting, 141142

Latent space

arithmetic on fake human faces in, 4244

birth of generative adversarial networks, 4041

Layers

building own project, 319320

deep learning model architectures, 5152

Leaky ReLU activation function, 96

Learn Python the Hard Way (Shaw), 75

Learning rate

batch normalization allowing for higher, 139

building own project, 320

shortcomings of improving SGD with momentum, 146

as step size in gradient descent, 117119

LeCun, Yan

on fabricating realistic images, 39

LeNet-5 model, 912

MNIST handwritten digits curated by, 7678

PyTorch development, 323324

Turing Award for deep learning, 15

Legg, Shane, 58

Lemmatization, as sophisticated alternative to stemming, 196

LeNet-5 model

AlexNet vs., 1517

in Keras, 171176

machine vision, 912

Les 3 Brasseurs bar, 39

Levine, Sergey, 67

Li, Fei-Fei, 1314

Libraries, deep learning, 321324

Linear regression, object detection with R-CNN, 183184

List comprehension

adding word stemming to, 201

removing stop words and punctuation, 200201

load() method, neural network model for DQN agent, 300

Loading data

coding shallow network in Keras, 7981

for shallow net, 8081

load_weights() method, loading model parameters, 232

Local minimum of cost, escaping, 122124

Localist representations, distributed representations vs., 3233

Long short-term memory (LSTM) cells

bidirectional (Bi-LSTMs), 247248

implementing with Keras, 246247

as layer of NLP, 53

overview of, 244246

Long-term memory, LSTM, 245246

Lowercase

converting all characters in NLP to, 195196, 199200

processing full corpus, 204206

LSTM. See Long short-term memory (LSTM) cells

LunarLander, OpenAI Gym environment, 70

M

Maaten, Laurens van der, 213214

Machine art

arithmetic on fake human faces, 4144

boozy all-nighter, 3941

creating photorealistic images from text, 4546

image processing using deep learning, 4648

make your own sketches photorealistic, 45

overview of, 39

style transfer, 4445

summary, 48

Machine language. See Human and machine language

Machine learning (ML). See also Traditional machine learning (ML) approach

overview of, 50

reinforcement learning problems of, 5456

representation learning as branch of, 51

supervised learning problems of, 5354

traditional machine vs. representation learning techniques, 22

unsupervised learning problems of, 54

Machine translation, NLP in, 2324

Machine vision

AlexNet, 1417

AlexNet and VGGNet in Keras, 176179

CNNs. See Convolutional neural networks (CNNs)

converting existing project, 316317

datasets for deep learning image-classification models, 313315

ImageNet and ILSVRC, 1314

key concepts, 193

LeNet-5, 912

LeNet-5 in Keras, 171176

neocognition, 89

object recognition tasks, 5253

overview of, 8, 159

pooling layers, 169170

Quick, Draw! game, 19

residual networks, 179182

Software 2.0 and, 326

summary, 20, 193

TensorFlow Playground, 1719

traditional machine learning approach, 1213

Machine vision, applications of

capsule networks, 192

Fast R-CNN, 184

Faster R-CNN, 184185

image segmentation, 186187

Mask R-CNN, 187

object detection, 183

overview of, 182

R-CNN, 183184

transfer learning, 188192

U-Net, 187188

YOLO, 185186

Magnetic resonance imaging (MRI), and visual cortex, 78

Manipulation of objects, 6768

Markov decision process (MDP), 286290

Mask R-CNN, image segmentation with, 186187

Mass, Andrew, 203

matplotlib, weight initialization, 132

max operation, pooling layers, 170

Max-pooling layers

AlexNet and VGGNet in Keras, 176179

LeNet-5 in Keras, 170174

MaxPooling2D dependency, LeNet-5 in Keras, 171174

MCTS (Monte Carlo tree search) algorithm, 61, 66

MDP (Markov decision process), 286290

Mean squared error, 112, 298

Memory

batch size/stochastic gradient descent and, 119122

DQN agent gameplay, 298

Software 2.0 and, 326

training DQN agent via replay of, 298299

Metrics, SLM Lab performance, 305306

Milestones, deep learning for NLP, 2425

min_count argument, word2vec, 210211

Minibatches, splitting training data into, 119122

ML. See Machine learning (ML)

Mnih, Volodymyr, 5860

MNIST handwritten digits

calculus for backpropagation, 337

coding shallow network in Keras, 7678

computational complexity in dense networks, 160

Fashion-MNIST dataset deep learning project, 313315

loading data for shallow net, 8081

loss of two-dimensional imagery in dense networks, 159160

reformatting data for shallow net, 8183

schematic diagram of shallow network, 7779

software dependencies for shallow net, 80

in stochastic gradient descent, 120

training deep networks with data augmentation, 145

using in Keras, 171176

Model generalization. See Overfitting, avoiding

Model optimization, agents beyond DQN using, 307

ModelCheckpoint() object, dense sentiment classifier, 231232

Modeling process, building own project, 318321

Momentum, 145146

Monet, Claude, 4445

Monte Carlo tree search (MCTS) algorithm, 61, 66

Morphemes, natural human language, 34

Morphology, natural human language, 3435

most_similar() method, word2vec, 212213

Motion, detecting in visual cortex, 78

Mountain Car game, 316

MRI (magnetic resonance imaging), and visual cortex, 78

Müller, Vincent, 72

Multi ConvNet Sentiment Classifier Jupyter notebook, 320

MXNet, deep learning library, 324

N

n-dimensional spaces, 4243, 339

n-grams, 196, 202203

Nair, Vinod, 9495

Natural human language, elements of, 3335

Natural language classification

dense network classifier architecture, 229235

examining IMDb data, 227228

with familiar networks, 222

loading IMDb film reviews, 222226

processing in document, 2324

standardizing length of reviews, 228229

Natural Language Preprocessing Jupyter notebook, 197

Natural language processing (NLP)

area under ROC curve, 217222

building own deep learning project, 315316

building socially beneficial projects, 318

computational representations of. See Language, computational representations of

deep learning approaches to, 53

examples, 2324

Google Duplex as deep-learning, 3537

history of deep learning, 2425

key concepts, 256257

learning representations automatically, 2223

natural human language elements of, 3335

natural language classification in. See Natural language classification

networks designed for sequential data, 240251

non-sequential architectures, 251256

overview of, 195

preprocessing. See Preprocessing natural language data

Software 2.0 and, 326

summary, 256

transfer learning in, 251

word embedding with word2vec. See word2vec

n_components, plotting word vectors, 214

Negative rewards, reinforcement learning problems and, 56

Negative sampling, training word2vec, 208

Neocognition

LeNet-5 advantages over, 1314

LeNet-5 model and, 912

machine vision and, 89

Nesterov momentum optimizer, stochastic gradient descent, 146

Network architecture, regression model, 150151

Network depth, as hyperparameter, 125126

Neural Information Processing Systems (NIPS) conference, 41

Neural machine translation (NMT), seq2seq models, 250

Neural networks

building deep in PyTorch, 343344

coding shallow in Keras, 83

formal notation for, 333334

Neuron saturation. See Saturated neurons

Neurons

AlexNet vs. LeNet-5, 17

behaviors of biological, 8586

forming primary visual cortex, 47

neocognition and, 89

regions processing visual stimuli in visual cortex, 78

TensorFlow Playground and, 1719

tuning hidden-layer count and number of, 126

next_state, DQN agent gameplay, 298

NIPS (Neural Information Processing Systems) conference, 41

n_iter, plotting word vectors, 214

NLP. See Natural language processing (NLP)

NMT (neural machine translation), seq2seq models, 250

Noë, Alva, 39

Non-sequential model architecture, 251256

Non-trainable params, model object, 109110

Nonlinearity, of ReLU neuron, 95

Notation, formal neural network, 333334

Number of epochs of training

as hyperparameter, 122

rule of thumb for learning rate, 119

stochastic gradient descent and, 119122

training deep learning model, 8384

NumPy

PyTorch tensors and, 324, 339

selecting action for DQN agent, 299300

weight initialization, 132, 134

O

Object detection

with Fast R-CNN, 184

as machine vision application, 182183

with R-CNN, 183184

understanding, 183

with YOLO, 185186

Objective function (π), maximizing reward with, 290

Objects

manipulation of, 6768

recognition tasks of machine vision, 5253

Occam’s razor, neuron count and, 126

Oliveira, Luke de, 315, 316

On-device processing, machine learning for, 4648

One-hot format

computational representations of language via, 2526

converting integer labels to, 8283

localist vs. distributed representations, 3233

Online resources

building deep learning projects, 317318

pretrained word vectors, 230

OpenAI Gym

building deep learning projects, 316

Cart-Pole game, 284286

deep reinforcement learning, 6870

interacting with environment, 300303

Optimal policy

building neural network model for, 288290

estimating optimal action via Q-learning, 290292

Optimal Q-value (Q*), estimating, 291292

Optimization

agents beyond DQN using, 306307

fancy optimizers for stochastic gradient descent, 145147

hyperparameter optimizers, 130, 303306

minimizing cost via. See Cost, minimizing via optimization

stochastic gradient descent. See Stochastic gradient descent (SGD)

Output layer

artificial neural network with, 99

batch normalization and, 139

building network model for DQN agent, 298

calculus behind backpropagation, 335, 337

deep learning model architectures, 5152

LSTM, 245

notation for neural networks, 334

perceptrons, 8687, 89

schematic diagram of shallow network, 79

softmax layer for multiclass problems, 106108

softmax layer of fast food-classifying network, 106107

TensorFlow Playground demo, 100

Overfitting, avoiding

building your own project, 320

data augmentation, 145

dropout, 142145

L1 and L2 regularization, 141142

model generalization and, 140141

P

Pac-Man

discount factor (decay) and, 288289

DQN agent initialization and, 296

Padding

convolutional example of, 163167

as convolutional filter hyperparameter, 167168

standardizing length of IMDb film reviews, 228229

Parameter initialization, building own project, 319

Parameters. See also Hyperparameters

Cart-Pole DQN agent initialization, 295297

creating dense network classifier architecture, 230232

escaping local minimum, 122124

gradient descent minimizing cost across multiple, 116117

pooling layers reducing overall, 169170

saving model, 300

weight initialization, 132135

Parametric ReLU activation function, 96

Partial-derivative calculus, cross-entropy cost, 114115

Patches, in convolutional layers, 160

PCA (principal component analysis), 213

Perceptrons

choosing, 96

hot dog/not hot dog detector example, 8690

modern neurons vs., 91

as most important equation in this book, 9091

overview of, 86

Performance

hyperparameter optimization using SLM Lab, 303306

Software 2.0 and, 326

PG. See Policy gradient (PG) algorithm

Phonemes, natural human language and, 34

Phonology, natural human language and, 3435

Photorealistic images, creating. See Machine art

Phraser() method, NLP, 202203, 204205

Phrases() method, NLP, 202203, 204205

Pichai, Sundar, 3536

pix2pix web application, 4546

Pixels

computational complexity and, 160

converting integers to floats, 82

convolutional example of, 163167

convolutional layers and, 160162

handwritten MNIST digits as, 7778

kernel size hyperparameter of convolutional filters, 167

reformatting data for shallow net, 8183

schematic diagram of shallow network, 7879

two-dimensional imagery and, 159160

Plotting

GAN training accuracy, 281

GAN training loss, 280281

word vectors, 213217

Policy function (π), discounted future reward, 288290

Policy gradient (PG) algorithm

actor-critic using Q-learning with, 307308

in deep reinforcement learning, 68

REINFORCE algorithm as, 307

Policy networks, AlphaGo, 61

Policy optimization

agents beyond DQN using, 307

building neural network model for, 288290

estimating optimal action via Q-learning, 290292

RL agent using actor-critic with, 307308

Pooling layers, 169170, 176

Positive rewards, deep reinforcement learning, 56, 57

Prediction

selecting action for DQN agent, 300

training dense sentiment classifier, 232

training DQN agent via memory replay, 299

word2vec using predictive models, 208

Preprocessing natural language data

converting all characters to lowercase, 199200

full corpus, 203206

handling n-grams, 202203

overview of, 195197

removing stop words and punctuation, 200201

stemming, 201

tokenization, 197199

Principal component analysis (PCA), 213

Probability distribution, Markov decision processes, 288

Processing power, AlexNet vs. LeNet-5, 1617

Project Gutenberg. See Preprocessing natural language data

Punctuation

processing full corpus, 204206

removing, 196, 200

Python, for example code in this book, 7576

PyTorch

building deep neural network in, 343344

deep learning library, 323324

features, 339340

installation, 341

in practice, 341343

TensorFlow vs., 340341

Q

Q-learning networks

actor-critic combining PG algorithms with, 307308

DQNs. See Deep Q-learning networks (DQNs)

Q-value functions

agents beyond DQN optimizing, 306

drawbacks of DQN agents, 306

estimating optimal, 291292

training DQN agent via memory replay, 299

Quadratic cost, 112113

Quake III Arena, DeepMind Lab built on, 69

Quick, Draw! game

GANs and, 263266

for hundreds of machine-drawn sketches, 48

introduction to deep learning, 19

R

R-CNN

Fast R-CNN, 184

Faster R-CNN, 184185

Mask R-CNN, 186187

object detection application, 183184

Radford, Alec, 4144

RAM (memory), batch size/stochastic gradient descent and, 119122

rand function, DQN agent action selection, 299300

randrange function, DQN agent action selection, 300

Rectified linear unit neurons. See ReLU (rectified linear unit) neurons

Recurrent neural networks (RNNs)

bidirectional LSTM, 247248

LSTM, 244247

LSTM cell as layer of NLP in, 53

overview of, 240244

stacked recurrent models, 248250

Redmon, Joseph, 185186

Reformatting data, coding shallow network, 8183

Regions of interest (ROIs)

developing Faster R-CNN, 184185

image segmentation with Mask R-CNN, 187

object detection with Fast R-CNN, 184

object detection with R-CNN, 183184

Regression, improving deep networks, 149152

REINFORCE algorithm, agents beyond DQN using, 307

Reinforcement learning

building socially beneficial projects, 318

essential theory of, 283286

overview of, 49

problems of machine learning, 5456

as sequential decision-making problems, 284

Reinforcement Learning: An Introduction (Barto), 292

Reinforcement learning, deep

agents beyond DQN, 306308

board games. See Board games

building own project. See Deep learning projects, building own

Cart-Pole game, 284286

DeepMind DQN using, 5859

defining DQN agent, 293300

essential theory of deep Q-learning networks, 290292

essential theory of reinforcement learning, 283286

game-playing applications. See Game-playing machines

hyperparameter optimization with SLM Lab, 303306

interacting with OpenAI Gym environment, 300303

key concepts, 308309

manipulation of objects, 6768

Markov decision processes, 286288

optimal policy, 288290

overview of, 5657, 283

popular learning environments for, 6871

summary, 308

video games, 5760

ReLU (rectified linear unit) neurons

with Glorot distributions, 136137

neural network model for DQN agent, 297

overview of, 9495

as preferred neuron type, 96

TensorFlow Playground demo, 100

Representation learning, 22, 51

requires_grad argument, PyTorch, 342

Residual connections, 180182

Residual modules, 180182

Residual networks (ResNets), 180182

Resources, building deep learning projects, 317318

return_sequencesTrue, stacking recurrent layers, 248

Reward(s)

deep Q-learning network theory, 290292

DeepMind DQN and, 59

DeepMind Lab, 69, 71

DQN agent gameplay, 298

Markov decision processes (MDPs), 287289

optimal policy, 288290

reinforcement learning problems and, 56

theory of reinforcement learning, 283

training DQN agent via memory replay, 298299

Ridge regression, reducing model overfitting, 141142

RMSProp, 147

RMSProp optimizer, 147

ROC AUC metric

as area under ROC curve, 217218

calculating, 219222, 234

confusion matrix, 218219

for sentiment classifier model architectures, 256

ROIs. See Regions of interest (ROIs)

Rosenblatt, Frank, 8690

Round of training, stochastic gradient descent, 120121

Running time, Software 2.0 and, 325

S

Sabour, Sara, 192

Saturated neurons

as flaw in calculating quadratic cost, 112113

minimizing impact using cross-entropy cost, 113115

reducing with cross-entropy cost and weight initialization, 131135

weight initialization, Glorot normal distribution, 136

Saving model parameters, 300

Schematic diagram

activation values in feature map of convolutional layer, 164

coding shallow network in Keras, 7779

of discriminator network, 268

of generator network, 270

of LSTM, 245

of recurrent neural network, 241

wide and deep modeling, 317

Schmidhuber, Jürgen, 244

Search, automating hyperparameter, 321

Search engines, NLP in, 2324

Sedol, Lee, 62

See-in-the-Dark dataset, image processing, 4748

Semantics, natural human language and, 3435

sentences argument, word2vec, 210

Sentiment classifier

bidirectional LSTM, 247248

convolutional, 236239

dense, 229235

LSTM architecture, 247

LSTM hyperparameters, 246247

non-sequential architecture example, 251255

performance of model architectures, 256

seq2seq (sequence-to-sequence), and attention, 250

Sequential decision-making problems, 284

Sequential model, building for DQN agent, 297298

sg argument, word2vec, 210

SG (skip-gram) architecture, 207, 208

SGD. See Stochastic gradient descent (SGD)

Shadow Dexterous Hand, OpenAI Gym, 70

Shallow network

coding. See Coding shallow network in Keras

for dense networks, 108110

intermediate-depth neural network in, 127129

vs. deep learning, 7879

Shogi, AlphaZero and, 6566

Short-term memory, LSTM, 245246

Sigmoid Function Jupyter notebook, 105

Sigmoid neuron(s)

activation function of, 9294

for binary classification problems, 100101, 105106

choosing, 96

for shallow net in Keras, 79, 83

softmax function with single neuron equivalent to using, 108

weight initialization and, 133137

Silver, David, 6162, 6566

Similarity score, running word2vec, 212213

Simple neurons

forming primary visual cortex, 67

neocognition and, 89

SimpleRNN() layer, RNN sentiment classifier, 243

size argument, word2vec, 210

Skip-gram (SG) architecture, 207, 208

SLM Lab, 303306, 316

Socially beneficial projects, deep learning projects, 318

Sodol, Lee, 62, 64

Softmax layer, fast food-classifying network, 106108

Softmax probability output, Fast R-CNN, 184

Software dependencies, shallow net in Keras, 80

Sofware 2.0, deep learning models, 324326

Speech recognition, NLP in, 24

Spell-checkers, 24

Squared error, as quadratic cost, 112

Stacked recurrent models, 248250

StackGAN, photorealistic images from text, 4546

State(s)

deep Q-learning network theory and, 290292

DeepMind DQN and, 58

DQN agent, remembering gameplay, 298

Markov decision processes and, 286

optimal policy in deep reinforcement learning and, 289290

reinforcement learning problems and, 56

reinforcement learning via Cart-Pole game and, 286

theory of reinforcement learning, 284

Static scatterplot, plotting word vectors, 214216

Stemming, word

forgoing removal of, 203206

overview of, 201

preprocessing natural language via, 196

Stochastic gradient descent (SGD)

escaping local minimum of cost via, 122124

fancy optimizers for, 145147

training deep networks using batch size and, 119124

Stop words

forgoing removal of, 203206

how to remove, 200

removing in NLP, 195196

Stride length

as convolutional filter hyperparameter, 167

pooling layers using, 169170

reducing computational complexity, 170

Style transfer, 4445

Suleyman, Mustafa, 58

Supervised learning problems, machine learning, 5354

Support vector machines, R-CNN, 183184

Sutskever, Ilya, 14, 16

Sutton, Richard, 292

Syntax, natural human language and, 3435

T

Tacotron, TTS engine, 3637

Tanh neurons

activation function of, 94

choosing, 96

with Glorot distributions, 136137

LSTM, 244245

Target word

converting natural words to word vectors, 2728

running word2vec, 207209

Tensor processing units (TPUs), Google training neural networks, 64

TensorBoard dashboard, 152154

TensorFlow, 321323

TensorFlow Playground, 1719, 100

Tensors, PyTorch

automatic differentiation in, 342343

building deep neural network, 343344

compatibility with NumPy operations, 324

features, 339340

Terminal state, theory of reinforcement learning, 284

Text, creating photorealistic images from, 4546

Text-to-speech (TTS) engine, Google Duplex, 3637

Theano, deep learning library, 324

Theory, essential

of deep Q-learning networks, 290292

of GANs, 259262

of reinforcement learning, 283284

of RNNs, 240244

of word2vec, 206209

Threshold value, perceptron equation, 8991

Tokenization

examining IMDb data, 226228

natural human language and, 3536

preprocessing natural language, 195, 197199

Torch, PyTorch as extension of, 323324

torch.nn.NLLLoss() function, PyTorch, 344

TPUs (tensor processing units), Google training neural networks, 64

Traditional machine learning (ML) approach

deep learning approach vs., 1112

entrants into ILSVRC using, 1415

natural human language in, 3335

one-hot encoding of words in, 2526

understanding, 1213

train() method

training DQN agent, 299

training GAN, 275281

Training

AlexNet vs. LeNet-5, 1617

AlphaGo vs. AlphaGo Zero, 6365

TensorFlow Playground, 1719

Training deep networks

adversarial network, 272274

backpropagation, 124125

batch size and stochastic gradient descent, 119122

coding shallow network in Keras, 8384

convolutional sentiment classifier, 238

cost functions, 111115

cross-entropy cost, 113115

data augmentation for, 145

deep neural network in Keras, 147149

dense sentiment classifier, 232

escaping local minimum, 122124

generative adversarial networks (GANs), 259262, 275281

gradient descent, 115117

intermediate-depth neural network, 128129

intermediate net in Keras, 127129

key concepts, 130

learning rate, 117119

minimizing cost via optimization, 115

overview of, 111

preventing overfitting with dropout, 142145

quadratic cost, 112113

recurrent neural networks (RNNs), 241

running word2vec, 208

saturated neurons, 112113

summary, 129130

transfer learning model of, 188192

tuning hidden-layer and neuron counts, 125126

via memory replay for DQN agent, 298299

Transfer learning

machine vision and, 188192

natural language and, 230

in NLP, 251

overview of, 188192

Truncation, standardizing film review length, 228229

TSNE() method, plotting word vectors, 214216

TTS (text-to-speech) engine, Google Duplex, 3637

Two-dimensional images, flattening to one dimension, 82

Two-dimensional structure of visual imagery

overview of, 159160

retaining in convolutional layers, 167

retaining using LeNet-5 in Keras, 172

U

U-Net, image segmentation, 187188

ULMFiT (universal language model fine-tuning), transfer learning, 251

United States Postal Service, LeNet-5 reading ZIP codes, 11

Unity ML-Agents plug-in, 71, 304

Unstable gradients, improving deep networks, 137139

Unsupervised learning problems, machine learning, 54

Upsampling layers, 187, 272

V

Validation data, 232235, 239

Value functions, Q-learning, 291292

Value networks, AlphaGo algorithm, 61

Value optimization

agents beyond DQN using, 306

RL agent using actor-critic algorithm and, 307308

Vanishing gradient problem

in artificial neural networks, 137138

performance degradation in deep CNNs, 179180

Vector space

embeddings. See Word vectors

latent space similarities to, 4243

word meaning represented by three dimensions, 2729

word-vector arithmetic, 2930

Venn diagram, 22, 50

VGGNet, 178179, 188192

Video games, 5760

Viola, Paul, 1213

Visual imagery, two-dimensional structure of, 159160

Visual perception

cerebral cortex research on, 47

development of species on planet due to, 34

W

WaveNet, Google Duplex TTS engine, 3637

Weight initialization, 131137

Weighted sum, perceptron algorithm, 8689

Weight(s)

backpropagation and, 125, 335337

convolutional example of, 163167

of kernels in convolutional layers, 160162

minimizing cost via gradient descent, 115116

notation for neural networks, 334

Wide and deep modeling approach, Google, 317

Wiesel, Torsten

LeNet-5 model built on work of, 1012

machine vision using work of, 89

research on visual cortex, 47

window argument, word2vec, 210

Wittgenstein, Ludwig, 21

Word embeddings. See Word vectors

Word vectors. See also word2vec

arithmetic of, 2930

capturing word meaning, 195

computational representations. See Language, computational representations of

convolutional filters detecting triplets of, 239

evaluating, 209

localist vs. distributed representations, 3233

in NLP. See Natural language processing (NLP)

online pretrained, 230

plotting, 213217

training on natural language data, 229230

word2viz tool for exploring, 3032

word2vec

converting natural words to word vectors, 28

essential theory behind, 206209

evaluating word vectors, 209

FastText as leading alternative to, 209

plotting word vectors, 213217

running, 209213

word embeddings, 206

Words

creating embeddings with word2vec. See word2vec

natural human language and, 3335

preprocessing natural language. See Preprocessing natural language data

word_tokenize() method, natural language, 199

workers argument, word2vec, 211

X

Xavier Glorot distributions, improving deep networks, 135137

Y

Yelp review polarity, 316

YOLO (You Only Look Once), object detection, 185186

Z

Zhang, Xiang, 315

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.17.45