Numerics
1 × 1 convolutional layer 220-221
A
AAVER (adaptive attention for vehicle re-identification) 430
acc value 142
accuracy 431-433
as metric for evaluating models 147
improvements to 192
of image classification 185-192
building model architecture 187-189
evaluating models 191-192
importing dependencies 185
preparing data for training 186-187
training models 190-191
activation functions 51-60, 63, 200, 205
binary classifier 54
heaviside step function 54
leaky ReLU 59-60
linear transfer function 53
logistic function 55
ReLU 58-59
sigmoid function 55
softmax function 57
tanh 58-59
activation maps 108, 252
activation type 165
Adam (adaptive moment estimation) 175
Adam optimizer 190, 352
adaptive learning 170-171
adversarial training 343
AGI (artificial general intelligence) 342
AI vision systems 6
AlexNet 203-211
architecture of 205
data augmentation 206
dropout layers 206
features of 205-207
in Keras 207-210
learning hyperparameters in 210
local response normalization 206
performance 211
ReLu activation function 205
training on multiple GPUs 207
weight regularization 207
algorithms
classifier learning algorithms 33-34
in DeepDream 385-387
alpha 330
AMI (Amazon Machine Images) 442
Anaconda 438-439
anchor boxes 303-305
AP (average precision) 292
artificial neural networks (ANNs) 4, 8, 37, 42, 49, 92
atrous convolutions 318
attention network 302
AUC (area under the curve) 292
augmenting
data 180-181
for image classification 187
in AlexNet 206
images 156
average pooling 115-116, 200
AWS EC2 environment
creating AWS account 441-442
Jupyter notebooks 443-444
remotely connect to instance 443
setting up 441-444
B
background region 286, 306
backpropagation 86-90
backward pass 87
base networks 313-314
predicting with 314
to extract features 301-302
baseline models 149-150
base_model summary 246-247, 270
batch gradient descent (BGD) 77-85, 171
derivative 80
direction 79-80
gradient 79
learning rate 80
pitfalls of 82-83
step size 79-80
batch hard (BH) 419
batch normalization 181-185
covariate shift
defined 181-182
in neural networks 182-183
in Keras 184
overview 183
batch normalization (BN) 206, 227, 230, 350
batch sample (BS) 421-423
batch weighted (BW) 421
batch_size hyperparameter 51, 85, 190
Bayes error rate 158
biases 63
BIER (boosting independent embeddings robustly) 428
binary classifier 54
binary_crossentropy function 352-353
block1_conv1 layer 378, 381
block3_conv2 layer 378
block5_conv2 layer 383, 395
block5_conv3 layer 378, 383
blocks. See residual blocks
bottleneck layers 221
bottleneck residual block 233
bottleneck_residual_block function 234, 237
bounding box coordinates 322
bounding box prediction 287
bounding boxes
in YOLOv3 324
predicting with regressors 303-304
bounding-box regressors 293, 296-297
build_discriminator function 367
build_model() function 328, 330
C
Cars Dataset, Stanford 372
categories 18
CCTV monitoring 405
cGAN (conditional GAN) 361
chain rule 88
channels value 122
CIFAR dataset 264-265
Inception performance on 229
ResNet performance on 238
CIFAR-10 dataset 99, 133, 185-186
class predictions 287, 322
classes 18
classes argument 237
Class_id label 328
classification 105
classification loss 308
classification module 18, 293, 298
classifier learning algorithms 33-34
classifiers 233
binary 54
in Keras 229
pretrained networks as 254-256
CLVR (cross-level vehicle re-identification) 430
CNNs (convolutional neural networks)
adding dropout layers to avoid overfitting 124-128
advantages of 126
in CNN architecture 127-128
overview of dropout layers 125
overview of overfitting 125
architecture of 102-105, 195-239
AlexNet 203
classification 105
feature extraction 104
GoogLeNet 217-229
Inception 217-229
LeNet-5 199-203
ResNet 230-238
VGGNet 212-216
convolutional layers 107-114
convolutional operations 108-111
kernel size 112-113
number of filters in 111-112
overview of convolution 107-108
padding 113-114
strides 113-114
design patterns 197-199
fully connected layers 119
CNNs (convolutional neural networks) (continued)
image classification 92, 121-144
building model architecture 121-122
number of parameters 123-124
weights 123-124
with color images 133-144
with MLPs 93-102
implementing feature visualizer 381-383
overview 102-103, 375-383
pooling layers 114-118
convolutional layers 117-118
max pooling vs. average pooling 115-116
subsampling 114-118
visualizing features 377-381
coarse label 265
COCO datasets 320
collecting data 162
color channel 198
converting to grayscale images 23-26
image classification for 133-144
compiling models 140-141
defining model architecture 137-140
evaluating models 144
image preprocessing 134-136
loading datasets 134
loading models with val_acc 143
training models 141-143
combined models 368-369
combined-image 395
compiling models 140-141
computation problem 242
computer vision. See CV (computer vision)
conda list command 439
confidence threshold 289
confusion matrix 147-148
connection weights 38
content image 392
content loss 393-395
content_image 395
content_loss function 395
content_weight parameter 395
contrastive loss 410-411, 413
CONV_1 layer 122
CONV1 layer 207
CONV_2 layer 118, 123
CONV2 layer 208
CONV3 layer 208
CONV4 layer 208
CONV5 layer 208
ConvNet weights 259
convolution
overview 107-108
convolutional layers 107-114, 117-118, 200, 212, 217
convolutional operations 108-111
kernel size 112-113
number of filters in 111-112
padding 113-114
strides 113-114
convolutional neural network 10
convolutional neural networks. See CNNs (convolutional neural networks)
convolutional operations 108-111
correct prediction 291
cost functions 68
covariate shift
defined 181-182
in neural networks 182-183
cross-entropy 71-72
cross-entropy loss 409-410
cuDNN 442
CV (computer vision) 3-35
applications of 10-15
creating images 13-14
face recognition 15
image classification 10-11
image recommendation systems 15
localization 12
neural style transfer 12
object detection 12
classifier learning algorithms 33-34
extracting features 27-33
automatically extracted features 31-33
handcrafted features 31-33
features
advantages of 33
overview 27-31
image input 19-22
color images 21-22
computer processing of images 21
images as functions 19-20
image preprocessing 23-26
interpreting devices 8-10
pipeline 4, 17-19, 36
sensing devices 7
vision systems 5-6
AI vision systems 6
human vision systems 5-6
visual perception 5
D
Darknet-53 325
data
augmenting 180-181
for image classification 187
in AlexNet 206
collecting 162
loading 331-332
mining 414-423
BH 419
BS 421-423
BW 421
dataloader 414-416
finding useful triplets 416-419
normalizing 154-155, 186
preparing for training 151-156, 186-187
preprocessing 153-156
augmenting images 156
grayscaling images 154
resizing images 154
splitting 151-153
data distillation 137
DataGenerator objects 331
dataloader 414-416
datasets
downloading to GANs 364
Kaggle 267
loading 134
MNIST 203, 263
splitting for training 136
splitting for validation 136
validation datasets 152
DCGANs (deep convolutional generative adversarial networks) 345, 362, 365, 370
deep neural network 48
DeepDream 374, 384-399
algorithms in 385-387
in Keras 387-391
deltas 304
dendrites 38
See also fully connected layers
Dense_1 layer 123
Dense_2 layer 123
dependencies, importing 185
deprocess_image(x) 383
design patterns 197-199
detection
measuring speed of 289
multi-stage vs. single-stage 310
diagnosing
overfitting 156-158
underfitting 156-158
dilated convolutions 318
dilation rate 318
dimensionality reduction with Inception 220-223
1 × 1 convolutional layer 220-221
impact on network performance 222
direction 79-80
discriminator 343, 351
discriminator_model method 346, 352
discriminators
in GANs 367
training 352
DL (deep learning) environments
conda environment 440
loading environments 441
manual development environments 439-440
saving environments 441
setting up 439-441
dropout hyperparameter 51
dropout layers 179-180
adding to avoid overfitting 124-128
advantages of 126
in AlexNet 206
in CNN architecture 127-128
overview 125
dropout rate 179
dropout regularization 215
E
early stopping 175-177
EC2 Management Console 442
EC2 On-Demand Pricing page 442
edges 46
embedding networks, training 423-431
finding similar items 424
implementation 426
object re-identification 424-426
testing trained models 427-431
object re-identification 428-431
retrievals 427-428
embedding space 401
endAnaconda 438
environments
conda 440
developing manually 439-440
loading 441
saving 441
epochs 85, 169, 190
number of 51, 175-177
training 353-354
error functions 68-73
advantages of 69
cross-entropy 71-72
errors 72-73
mean squared error 70-71
overview 69
weights 72-73
errors 72-73
evaluate() method 191, 274, 280
evaluation schemes 358-359
Evaluator class 397
exhaustive search algorithm 294
exploding gradients 230
exponential decay 170
F
f argument 234
face identification 15, 402
face recognition (FR) 15, 402
face verification 15, 402
false negatives (FN) 148-149, 291
false positives (FP) 148-149, 289
False setting 394
Fashion-MNIST 264, 363, 372
fashion_mnist.load_data() method 341
Fast R-CNNs (region-based convolutional neural networks) 297-299
architecture of 297
disadvantages of 299
multi-task loss function in 298-299
Faster R-CNNs (region-based convolutional neural networks)
architecture of 300
base network to extract features 301-302
fully connected layers 306-307
multi-task loss function 307-308
object detection with 300-308
RPNs 302-306
anchor boxes 304-305
predicting bounding box with regressor 303-304
training 305-306
FC layer 208
FCNs (fully convolutional networks) 48, 120, 303
feature extraction 104, 301-302
automatically 31-33
handcrafted features 31-33
feature extractors 232, 244, 256-258, 297
feature maps 103-104, 108, 241, 243, 250, 252, 396-397
feature vector 18
feature visualizer 381-383
feature_layers 397
features
advantages of 33
handcrafted 31-33
learning 65-66, 252-253
overview 27-31
transferring 254
visualizing 377-381
feedforward process 62-66
learning features 65-66
FID (Fréchet inception distance) 357-358
filter hyperparameter 138
filter_index 381
filters 111-112
filters argument 117, 234
fine label 265
fine-tuning 258-259
advantages of 259
learning rates when 259
transfer learning 274-282
.fit() method 141
fit_generator() function 332
Flatten layer 95-96, 123, 208, 276
flattened vector 119
FLOPs (floating-point operations per second) 77
flow_from_directory() method 269, 276
foreground region 286, 306
FPS (frames per second) 289, 311
freezing layers 247
F-score 149
fully connected layers 101, 119, 212, 306-307
functions
images as 19-20
training 369-370
G
gallery set 423
GANs (generative adversarial networks) 341-373, 430
applications for 359-362
image-to-image translation 360-361
Pix2Pix GAN 360-361
SRGAN 361-362
architecture of 343-356
DCGANs 345
generator models 348-350
minimax function 354-356
building 362-372
combined models 368-369
discriminators 367
downloading datasets 364
GANs (generative adversarial networks) (continued)
evaluating models of 357-359
choosing evaluation scheme 358-359
FID 358
inception score 358
generators 365-366
importing libraries 364
training 351-354, 370-372
discriminators 352
epochs 353-354
generators 352-353
training functions 369-370
visualizing datasets 364
generative models 342
generator models 348-350
generator_model function 349
generators 343, 351
in GANs 365-366
training 352-353
global average pooling 115
global minima 83
Google Open Images 267
GoogLeNet 217-229
architecture of 226-227
in Keras 225-229
building classifiers 229
building inception modules 228-229
building max-pooling layers 228-229
building network 227
learning hyperparameters in 229
GPUs (graphics processing units) 190, 207, 268, 296, 326, 372, 414, 441
gradient ascent 377
gradient descent (GD) 84-86, 155, 166-167, 184, 377
overview 78
with momentum 174-175
gradients function 382
gram matrix 396-397
graph transformer network 201
grayscaling
converting color images 23-26
images 154
ground truth bounding box 289-290, 305
GSTE (group-sensitive triplet embedding) 430
H
hard data mining 416
hard negative sample 417
hard positive sample 417
heaviside step function 54, 60
height value 122
hidden layers 46-47, 50, 62, 65, 119, 203
hidden units 111
high-recall model 149
human in the loop 162
human vision systems 5-6
hyperbolic tangent function 61
hyperparameters
learning
in AlexNet 210
in GoogLeNet 229
in Inception 229
in LeNet-5 202-203
in ResNet 238
in VGGNet 216
neural network hyperparameters 163-164
parameters vs. 163
tuning 162-165
collecting data vs. 162
neural network hyperparameters 163-164
parameters vs. hyperparameters 163
I
identity function 53, 60
if-else statements 30
image classification 10-11
for color images 133-144
compiling models 140-141
defining model architecture 137-140
evaluating models 144
image preprocessing 134-136
loading datasets 134
loading models with val_acc 143
training models 141-143
with CNNs 121-124
building model architecture 121-122
number of parameters 123-124
weights 123-124
with high accuracy 185-192
building model architecture 187-189
evaluating models 191-192
importing dependencies 185
preparing data for training 186-187
training models 190-191
with MLPs 93-102
drawbacks of 99-102
hidden layers 96
input layers 94-96
output layers 96
image classifier 18
image flattening 95
image preprocessing 33
image recommendation systems 15, 403
ImageDataGenerator class 181, 269, 276
ImageNet 265-266
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 204, 211, 224, 230, 266, 293
images 19-22
as functions 19-20
augmenting 156
color images 21-22
computer processing of 21
creating 13-14
grayscaling 154
preprocessing 23-26, 134-136
converting color to grayscale 23-26
one-hot encoding 135
preparing labels 135
splitting datasets for training 136
splitting datasets for validation 136
rescaling 135
resizing 154
image-to-image translation 360-361
Inception 217-229
architecture of 223-224
features of 217-218
learning hyperparameters in 229
modules 222-223
naive version 218-219
performance on CIFAR dataset 229
with dimensionality reduction 220-223
1 × 1 convolutional layer 220-221
impact on network performance 222
inception scores 358
inception_module function 225-226
include_top argument 247, 255, 394
input image 33, 385
input layers 46, 62
input vector 39
input_shape argument 122, 188, 237
instances 443
interpreting devices 8-10
IoU (intersection over union) 289-291, 319
J
Jaccard distance 432
joint training 401
Jupyter notebooks 443-444
K
K object classes 296
Kaggle datasets 267
Keras API
AlexNet in 207-210
batch normalization in 184
DeepDream in 387-391
GoogLeNet in 225-229
building classifiers 229
building inception modules 228-229
building max-pooling layers 228-229
building network 227
LeNet-5 in 200-201
ResNet in 235-237
keras.datasets 134
keras_ssd7.py file 328
kernel 107
kernel size 112-113
kernel_size hyperparameter 138, 187
L
L2 regularization 177-179
label smoothing 432
labeled data 6
labeled images 11
LabelImg application 328
labels 135
lambda parameter 178
lambda value 207
layer_name 382
layers 47-48, 138
1 × 1 convolutional 220-221
dropout 179-180
adding to avoid overfitting 124-128
advantages of 126
in AlexNet 206
in CNN architecture 127-128
overview 125
fully connected 101, 119, 306-307
hidden 47
representing style features 396
Leaky ReLU 61-62, 165
learning 166-173
adaptive 170-171
embedding 406-407
features 65-66, 252-253
finding optimal learning rate 169-170
hyperparameters
in AlexNet 210
in GoogLeNet 229
in Inception 229
in LeNet-5 202-203
in ResNet 238
in VGGNet 216
mini-batch size 171-173
See also transfer learning
learning curves, plotting 158-159, 191
batch gradient descent 79-80
decay 170-171
derivative and 80
optimal, finding 169-170
when fine-tuning 259
LeNet-5 199-203
architecture of 199
in Keras 200-201
learning hyperparameters in 202-203
on MNIST dataset 203
libraries in GANs 364
linear combination 40
linear datasets 45
linear decay 170
linear transfer function 53, 60
load_data() method 134
load_dataset() method 273, 280
loading
data 331-332
datasets 134
environments 441
models 143
local minima 83
local response normalization 206
localization 12
localization module 293
locally connected layers 101
LocalResponseNorm layer 227
location loss 308
logistic function 55, 61
loss
content loss 393-395
runtime analysis of 412-413
total variance 397
visualizing 334
loss functions 407-413
contrastive loss 410
cross-entropy loss 409-410
naive implementation 412-413
loss value 142-143, 191, 334
lr variable 169
lr_schedule function 202
M
MAC (multiplier-accumulator) 426
MAC operation 426
machine learning
human brain vs. 10
with handcrafted features 31
main path 233
make_blobs 160
matrices 67
matrix multiplication 67
max pooling 115-116, 200
max-pooling layers 228-229
mean absolute error (MAE) 71
mean average precision (mAP) 285, 289, 292, 317, 424, 427
mean squared error (MSE) 70-71
Mechanical Turk crowdsourcing tool, Amazon 266
metrics 140
min_delta argument 177
mini-batch gradient descent (MB-GD) 77, 84-85, 173, 238
mini-batch size 171-173
minimax function 354-356
mining data 414-423
BH 419
BS 421-423
BW 421
dataloader 414-416
finding useful triplets 416-419
mixed2 layer 389
mixed3 layer 389
mixed4 layer 389
mixed5 layer 389
MLPs (multilayer perceptrons) 45
architecture of 46-47
hidden layers 47
image classification with 93-102
drawbacks of 99-102
hidden layers 96
input layers 94-96
output layers 96
layers 47-48
nodes 47-48
MNIST (Modified National Institute of Standards and Technology) dataset 98, 101, 121, 203, 263, 354
models
architecture of 121-122, 137-140, 187-189
building 328, 330
compiling 140-141
configuring 329
designing 149-150
evaluating 144, 156-161, 191-192
building networks 159-161
diagnosing overfitting 156-158
diagnosing underfitting 156-158
evaluating networks 159-161
plotting learning curves 158-159, 191
training networks 159-161
loading 143
of GANs
choosing evaluation scheme 358-359
evaluating 357-359
FID 358
inception score 358
testing 427-431
object re-identification 428-431
retrievals 427-428
training 141-143, 190-191, 333
momentum, gradient descent with 174-175
monitor argument 177
MS COCO (Microsoft Common Objects in Context) 266-267
multi-scale detections 315-318
multi-scale feature layers 315-319
architecture of 318-319
multi-scale detections 315-318
multi-scale vehicle representation (MSVR) 431
multi-stage detectors 310
multi-task learning (MTL) 433
multi-task loss function 298-299, 307-308
N
naive implementation 412-413
naive representation 218-219, 222
n-dimensional array 67
neg_pos_ratio 330
networks 162-165, 222
architecture of 137-140
activation type 165
depth of neural networks 164-165
improving 164-165
width of neural networks 164-165
building 159-161
evaluating 159-161
improving 162-165
in Keras 227
measuring precision of 289
predictions 287-288
pretrained
as classifiers 254-256
as feature extractors 256-258
to extract features 301-302
training 159-161, 397-398
neural networks 36-91
activation functions 51-60
binary classifier 54
heaviside step function 54
leaky ReLU 59-60
linear transfer function 53
logistic function 55
ReLU 58-59
sigmoid function 55
softmax function 57
tanh 58-59
backpropagation 86-90
covariate shift in 182-183
depth of 164-165
error functions 68-73
advantages of 69
cross-entropy 71-72
errors 72-73
MSE 70-71
overview 69
weights 72-73
feedforward process 62-66
learning features 65-66
hyperparameters in 163-164
learning features 252-253
multilayer perceptrons 45-51
architecture of 46-47
hidden layers 47
layers 47-48
nodes 47-48
optimization 74-77
optimization algorithms 74-86
batch gradient descent 77-83
gradient descent 85-86
MB-GD 84
stochastic gradient descent 83-84
overview 376-377
perceptrons 37-45
learning logic of 43
neurons 43-45
overview 38-42
width of 164-165
neural style transfer 12, 374, 392-399
content loss 393-395
network training 397-398
style loss 396-397
gram matrix for measuring jointly activated feature maps 396-397
multiple layers for representing style features 396
total variance loss 397
neurons 8, 38, 40, 43-45, 206
new_model 248
NMS (non-maximum suppression) 285, 288-289, 319
no free lunch theorem 163
node values 63
nodes 46-48
noise loss 393
nonlinear datasets 44-45
nonlinearities 51
non-trainable params 124
normalizing data 154-155, 186
nstaller/application.yaml file 440
O
oad_weights() method 143
object detection 12
framework 285-292
network predictions 287-288
NMS 288-289
object detection (continued)
object-detector evaluation metrics 289-292
region proposals 286-287
with Fast R-CNNs 297-299
architecture of 297
disadvantages of 299
multi-task loss function in 298-299
with Faster R-CNNs 300-308
architecture of 300
base network to extract features 301-302
fully connected layers 306-307
multi-task loss function 307-308
RPNs 302-306
with R-CNNs 283-297, 310-337
disadvantages of 296-297
limitations of 310
multi-stage detectors vs. single-stage detectors 310
training 296
with SSD 283-310, 319-337
architecture of 311-313
base networks 313-314
multi-scale feature layers 315-319
NMS 319
training SSD networks 326-335
with YOLOv3 283-320, 325-337
architecture of 324-325
overview 321-324
object re-identification 405-406, 424-426, 428-431
object-detector evaluation metrics 289-292
FPS to measure detection speed 289
IoU 289-291
mAP to measure network precision 289
PR CURVE 291-292
objectness score 285-286, 313, 322
octaves 385-386
offline training 414
one-hot encoding 135, 187
online learning 171
Open Images Challenge 267
open source datasets 262-267
CIFAR 264-265
Fashion-MNIST 264
Google Open Images 267
ImageNet 265-266
Kaggle 267
MNIST 263
MS COCO 266-267
optimal weights 74
optimization 74-77
optimization algorithms 74-86, 174-177
Adam (adaptive moment estimation) 175
batch gradient descent 77-83
derivative 80
direction 79-80
gradient 79
learning rate 79-80
pitfalls of 82-83
step size 79-80
early stopping 175-177
gradient descent 85-86
overview 78
with momentum 174-175
MB-GD 84
number of epochs 175-177
stochastic gradient descent 83-84
optimization value 74
optimized weights 250
optimizer 352
output layer 40, 47, 62, 119
Output Shape columns 122
overfitting
adding dropout layers to avoid 124-128
diagnosing 156-158
overview 125
regularization techniques to avoid 177-181
augmenting data 180-181
dropout layers 179-180
L2 regularization 177-179
P
padding 113-114, 118, 138, 212, 219
PAMTRI (pose aware multi-task learning) 430
parameters
calculating 123-124
hyperparameters vs. 163
non-trainable params 124
number of 123-124
overview 123
trainable params 124
params
non-trainable 124
trainable 124
PASCAL VOC-2012 dataset 293
Path-LSTM 431
patience variable 177
.pem file 442
perceptrons 37-45
learning logic of 43
neurons 43-45
overview 38-42
step activation function 42
weighted sum function 40
performance metrics 146-149
accuracy 147
confusion matrix 147-148
F-score 149
performance metrics (continued)
person re-identification 406
pip install 444
Pix2Pix GAN (generative adversarial network) 360-361
plot_generated_images() function 370
plotting learning curves 158-159, 191
POOL layer 208
POOL_1 layer 123
POOL_2 layer 123
pooling layers 114-118, 203, 217
convolutional layers 117-118
max pooling vs. average pooling 115-116
PR CURVE (precision-recall curve) 291-292
precision 148, 289
predictions 335
across different scales 323-324
bounding box with regressors 303-304
for networks 287-288
with base network 314
preprocessing
data 153-156
augmenting images 156
grayscaling images 154
normalizing data 154-155
resizing images 154
images 23-26, 134-136
converting color images to grayscale images 23-26
one-hot encoding 135
preparing labels 135
splitting datasets for training 136
splitting datasets for validation 136
pretrained model 244
pretrained networks
as classifiers 254-256
as feature extractors 256-258, 268-274
priors 314
Q
query sets 423
Quick, Draw! dataset, Google 372
R
R-CNNs (region-based convolutional neural networks)
disadvantages of 296-297
limitations of 310
multi-stage detectors vs. single-stage detectors 310
object detection with 283-297, 310-337
training 296
receptive field 108, 213
reduce argument 235
reduce layer 220
reduce shortcut 234
region proposals 286-287
regions of interest (RoIs) 285-286, 293, 295-297, 306, 310
regression layer 299
regressors 303-304
regular shortcut 234
regularization techniques to avoid overfitting 177-181
augmenting data 180-181
dropout layers 179-180
L2 regularization 177-179
ReLU (rectified linear unit) 58-59, 61-62, 96, 106, 111, 118, 139, 151, 160, 165, 188, 199, 203, 205, 209, 212, 231, 366
activation functions 205
leaky 59-60
rescaling images 135
residual blocks 232-235
residual module architecture 230
residual notation 71
resizing images 154
ResNet (Residual Neural Network) 230-238
features of 230-233
in Keras 235-237
learning hyperparameters in 238
performance on CIFAR dataset 238
residual blocks 233-235
results, observing 370-372
retrievals 427-428
RGB (Red Green Blue) 21, 360
RoI extractor 297
RoI pooling layer 297, 300, 308
RoIs (regions of interest) 285-286, 293, 295-297, 306, 310
RPNs (region proposal networks) 302-306
anchor boxes 304-305
predicting bounding box with regressors 303-304
training 305-306
runtime analysis of losses 412-413
S
s argument 235
save_interval 369
scalar 67
scalar multiplication 67
scales, predictions across 323-324
scipy.optimize.fmin_l_bfgs_b method 398
sensing devices 7
shortcut path 233
Siamese loss 410
sigmoid function 55, 61, 63, 205
single class 306
single-stage detectors 310
skip connections 230-231
Softmax layer 208, 248, 297
source domain 254
spatial features 99-100
specificity 148
splitting
data 151-153
datasets
for training 136
for validation 136
SRGAN (super-resolution generative adversarial networks) 361
SSD (single-shot detector)
architecture of 311-313
base network 313-314
multi-scale feature layers 315-319
architecture of multi-scale layers 318-319
multi-scale detections 315-318
non-maximum suppression 319
object detection with 283-310, 319-337
training networks 326-335
building models 328
configuring models 329
creating models 330
loading data 331-332
making predictions 335
training models 333
visualizing loss 334
SSDLoss function 330
ssh command 443
StackGAN (stacked generative adversarial network) 359
step activation function 42
step function 38
step functions. See heaviside step function
step size 79-80
stochastic gradient descent (SGD) 77, 83-85, 171, 427
strides 113-114
style loss 396-397
gram matrix for measuring jointly activated feature maps 396-397
multiple layers for representing style features 396
style_loss function 393, 396
style_weight parameter 395
subsampling 114-118
supervised learning 6
suppression. See NMS (non-maximum suppression)
synapses 38
synset (synonym set) 204, 266
T
tanh (hyperbolic tangent function) 58-59
tanh activation function 200, 205
Tensorflow playground 165
tensors 67
testing trained model 427-431
object re-identification 428-431
retrievals 427-428
test_path variable 270
test_targets 280
test_tensors 280
TN (true negatives) 147
to_categorical function 160, 187
top-1 error rate 211
top-5 error rate 211, 216
top-k accuracy 427
total variance loss 397
total variation loss 393
total_loss function 393
total_variation_loss function 397
total_variation_weight parameter 395
TP (true positives) 147, 289
train() function 369-370
trainable params 124
train_acc value 152, 162
training
AlexNet 207
by trial and error 6
discriminators 352
embedding networks 423-431
finding similar items 424
implementation 426
object re-identification 424-426
testing trained models 427-431
epochs 353-354
functions 369-370
GANs 351-354, 370-372
generators 352-353
models 141-143, 190-191, 333
networks 159-161, 397-398
training (continued)
preparing data for 151-156, 186-187
augmenting data 187
normalizing data 186
one-hot encode labels 187
preprocessing data 153-156
splitting data 151-153
R-CNNs 296
RPNs 305-306
splitting datasets for 136
SSD networks 326-335
building models 328
configuring models 329
creating models 330
loading data 331-332
making predictions 335
training models 333
visualizing loss 334
train_loss value 152
train_on_batch method 352
train_path variable 270
transfer functions 51
in GANs 369-370
linear 53
transfer learning 150, 240-282
approaches to 254-259
using pretrained network as classifier 254-256
using pretrained network as feature extractor 256-258
choosing level of 260-262
when target dataset is large and different from source dataset 261
when target dataset is large and similar to source dataset 261
when target dataset is small and different from source 261
when target dataset is small and similar to source dataset 260-261
fine-tuning 258-259, 274-282
open source datasets 262-267
CIFAR 264-265
Fashion-MNIST 264
Google Open Images 267
ImageNet 265-266
Kaggle 267
MNIST 263
MS COCO 266-267
overview 243-254
neural networks learning features 252-253
transferring features 254
pretrained networks as feature extractors 268-274
when to use 241-243
transferring features 254
transposition 68
triplets, finding 416-419
tuning hyperparameters 162-165
collecting data vs. 162
neural network hyperparameters 163-164
parameters vs. hyperparameters 163
U
underfitting 125, 143, 156-158
untrained layers 248
Upsampling layer 350
Upsampling2D layer 348
V
val_acc 143
val_acc value 142-143, 152, 162
val_error value 157, 176
validation datasets
overview 152
splitting 136
valid_path variable 270
val_loss value 142-143, 152, 169, 191, 334
VAMI (viewpoint attentive multi-view inference) 430
vanishing gradients 62, 230
vector space 67, 401
VeRi dataset 424-425, 428, 430
VGG16 configuration 213, 215-216, 311, 313, 381
VGG19 configuration 213-214, 314
VGGNet (Visual Geometry Group at Oxford University) 212-216
configurations 213-216
features of 212-213
learning hyperparameters in 216
performance 216
vision systems 5-6
AI 6
human 5-6
visual embedding layer 401
visual embeddings 400-436
face recognition 402
image recommendation systems 403
learning embedding 406-407
loss functions 407-413
contrastive loss 410
cross-entropy loss 409-410
naive implementation 412-413
runtime analysis of losses 412-413
visual embeddings (continued)
mining informative data 414-423
BH 419
BS 421-423
BW 421
dataloader 414-416
finding useful triplets 416-419
training embedding networks 423-431
finding similar items 424
implementation 426
object re-identification 424-426
testing trained models 427-431
visual perception 5
visualizing
datasets 364
features 377-381
loss 334
VUIs (voice user interfaces) 4
W
warm-up learning rate 432
weight connections 46
weight decay 178
weight layers 199
weight regularization 207
weighted sum 38
weighted sum function 40
weights 72-73, 123-124
calculating parameters 123-124
non-trainable params 124
trainable params 124
weights vector 39
width value 122
X
X argument 234
x_test 186
x_train 186
x_valid 186
Y
YOLOv3 (you only look once)
architecture of 324-325
object detection with 283-320, 325-337
overview 321-324
output bounding boxes 324
predictions across different scales 323-324
Z
zero-padding 114