

1 × 1 convolutional layer 220-221


AAVER (adaptive attention for vehicle re-identification) 430

acc value 142

accuracy 431-433

as metric for evaluating models 147

improvements to 192

of image classification 185-192

building model architecture 187-189

evaluating models 191-192

importing dependencies 185

preparing data for training 186-187

training models 190-191

activation functions 51-60, 63, 200, 205

binary classifier 54

heaviside step function 54

leaky ReLU 59-60

linear transfer function 53

logistic function 55

ReLU 58-59

sigmoid function 55

softmax function 57

tanh 58-59

activation maps 108, 252

activation type 165

Adam (adaptive moment estimation) 175

Adam optimizer 190, 352

adaptive learning 170-171

adversarial training 343

AGI (artificial general intelligence) 342

AI vision systems 6

AlexNet 203-211

architecture of 205

data augmentation 206

dropout layers 206

features of 205-207

in Keras 207-210

learning hyperparameters in 210

local response normalization 206

performance 211

ReLu activation function 205

training on multiple GPUs 207

weight regularization 207


classifier learning algorithms 33-34

in DeepDream 385-387

alpha 330

AMI (Amazon Machine Images) 442

Anaconda 438-439

anchor boxes 303-305

AP (average precision) 292

artificial neural networks (ANNs) 4, 8, 37, 42, 49, 92

atrous convolutions 318

attention network 302

AUC (area under the curve) 292


data 180-181

for image classification 187

in AlexNet 206

images 156

average pooling 115-116, 200

AWS EC2 environment

creating AWS account 441-442

Jupyter notebooks 443-444

remotely connect to instance 443

setting up 441-444


background region 286, 306

backpropagation 86-90

backward pass 87

base networks 313-314

predicting with 314

to extract features 301-302

baseline models 149-150

base_model summary 246-247, 270

batch gradient descent (BGD) 77-85, 171

derivative 80

direction 79-80

gradient 79

learning rate 80

pitfalls of 82-83

step size 79-80

batch hard (BH) 419

batch normalization 181-185

covariate shift

defined 181-182

in neural networks 182-183

in Keras 184

overview 183

batch normalization (BN) 206, 227, 230, 350

batch sample (BS) 421-423

batch weighted (BW) 421

batch_size hyperparameter 51, 85, 190

Bayes error rate 158

biases 63

BIER (boosting independent embeddings robustly) 428

binary classifier 54

binary_crossentropy function 352-353

block1_conv1 layer 378, 381

block3_conv2 layer 378

block5_conv2 layer 383, 395

block5_conv3 layer 378, 383

blocks. See residual blocks

bottleneck layers 221

bottleneck residual block 233

bottleneck_residual_block function 234, 237

bounding box coordinates 322

bounding box prediction 287

bounding boxes

in YOLOv3 324

predicting with regressors 303-304

bounding-box regressors 293, 296-297

build_discriminator function 367

build_model() function 328, 330


Cars Dataset, Stanford 372

categories 18

CCTV monitoring 405

cGAN (conditional GAN) 361

chain rule 88

channels value 122

CIFAR dataset 264-265

Inception performance on 229

ResNet performance on 238

CIFAR-10 dataset 99, 133, 185-186

class predictions 287, 322

classes 18

classes argument 237

Class_id label 328

classification 105

classification loss 308

classification module 18, 293, 298

classifier learning algorithms 33-34

classifiers 233

binary 54

in Keras 229

pretrained networks as 254-256

CLVR (cross-level vehicle re-identification) 430

CNNs (convolutional neural networks)

adding dropout layers to avoid overfitting 124-128

advantages of 126

in CNN architecture 127-128

overview of dropout layers 125

overview of overfitting 125

architecture of 102-105, 195-239

AlexNet 203

classification 105

feature extraction 104

GoogLeNet 217-229

Inception 217-229

LeNet-5 199-203

ResNet 230-238

VGGNet 212-216

convolutional layers 107-114

convolutional operations 108-111

kernel size 112-113

number of filters in 111-112

overview of convolution 107-108

padding 113-114

strides 113-114

design patterns 197-199

fully connected layers 119

CNNs (convolutional neural networks) (continued)

image classification 92, 121-144

building model architecture 121-122

number of parameters 123-124

weights 123-124

with color images 133-144

with MLPs 93-102

implementing feature visualizer 381-383

overview 102-103, 375-383

pooling layers 114-118

convolutional layers 117-118

max pooling vs. average pooling 115-116

subsampling 114-118

visualizing features 377-381

coarse label 265

COCO datasets 320

collecting data 162

color channel 198

converting to grayscale images 23-26

image classification for 133-144

compiling models 140-141

defining model architecture 137-140

evaluating models 144

image preprocessing 134-136

loading datasets 134

loading models with val_acc 143

training models 141-143

combined models 368-369

combined-image 395

compiling models 140-141

computation problem 242

computer vision. See CV (computer vision)

conda list command 439

confidence threshold 289

confusion matrix 147-148

connection weights 38

content image 392

content loss 393-395

content_image 395

content_loss function 395

content_weight parameter 395

contrastive loss 410-411, 413

CONV_1 layer 122

CONV1 layer 207

CONV_2 layer 118, 123

CONV2 layer 208

CONV3 layer 208

CONV4 layer 208

CONV5 layer 208

ConvNet weights 259


overview 107-108

convolutional layers 107-114, 117-118, 200, 212, 217

convolutional operations 108-111

kernel size 112-113

number of filters in 111-112

padding 113-114

strides 113-114

convolutional neural network 10

convolutional neural networks. See CNNs (convolutional neural networks)

convolutional operations 108-111

correct prediction 291

cost functions 68

covariate shift

defined 181-182

in neural networks 182-183

cross-entropy 71-72

cross-entropy loss 409-410

cuDNN 442

CV (computer vision) 3-35

applications of 10-15

creating images 13-14

face recognition 15

image classification 10-11

image recommendation systems 15

localization 12

neural style transfer 12

object detection 12

classifier learning algorithms 33-34

extracting features 27-33

automatically extracted features 31-33

handcrafted features 31-33


advantages of 33

overview 27-31

image input 19-22

color images 21-22

computer processing of images 21

images as functions 19-20

image preprocessing 23-26

interpreting devices 8-10

pipeline 4, 17-19, 36

sensing devices 7

vision systems 5-6

AI vision systems 6

human vision systems 5-6

visual perception 5


Darknet-53 325


augmenting 180-181

for image classification 187

in AlexNet 206

collecting 162

loading 331-332

mining 414-423

BH 419

BS 421-423

BW 421

dataloader 414-416

finding useful triplets 416-419

normalizing 154-155, 186

preparing for training 151-156, 186-187

preprocessing 153-156

augmenting images 156

grayscaling images 154

resizing images 154

splitting 151-153

data distillation 137

DataGenerator objects 331

dataloader 414-416


downloading to GANs 364

Kaggle 267

loading 134

MNIST 203, 263

splitting for training 136

splitting for validation 136

validation datasets 152

DCGANs (deep convolutional generative adversarial networks) 345, 362, 365, 370

deep neural network 48

DeepDream 374, 384-399

algorithms in 385-387

in Keras 387-391

deltas 304

dendrites 38

See also fully connected layers

Dense_1 layer 123

Dense_2 layer 123

dependencies, importing 185

deprocess_image(x) 383

design patterns 197-199


measuring speed of 289

multi-stage vs. single-stage 310


overfitting 156-158

underfitting 156-158

dilated convolutions 318

dilation rate 318

dimensionality reduction with Inception 220-223

1 × 1 convolutional layer 220-221

impact on network performance 222

direction 79-80

discriminator 343, 351

discriminator_model method 346, 352


in GANs 367

training 352

DL (deep learning) environments

conda environment 440

loading environments 441

manual development environments 439-440

saving environments 441

setting up 439-441

dropout hyperparameter 51

dropout layers 179-180

adding to avoid overfitting 124-128

advantages of 126

in AlexNet 206

in CNN architecture 127-128

overview 125

dropout rate 179

dropout regularization 215


early stopping 175-177

EC2 Management Console 442

EC2 On-Demand Pricing page 442

edges 46

embedding networks, training 423-431

finding similar items 424

implementation 426

object re-identification 424-426

testing trained models 427-431

object re-identification 428-431

retrievals 427-428

embedding space 401

endAnaconda 438


conda 440

developing manually 439-440

loading 441

saving 441

epochs 85, 169, 190

number of 51, 175-177

training 353-354

error functions 68-73

advantages of 69

cross-entropy 71-72

errors 72-73

mean squared error 70-71

overview 69

weights 72-73

errors 72-73

evaluate() method 191, 274, 280

evaluation schemes 358-359

Evaluator class 397

exhaustive search algorithm 294

exploding gradients 230

exponential decay 170


f argument 234

face identification 15, 402

face recognition (FR) 15, 402

face verification 15, 402

false negatives (FN) 148-149, 291

false positives (FP) 148-149, 289

False setting 394

Fashion-MNIST 264, 363, 372

fashion_mnist.load_data() method 341

Fast R-CNNs (region-based convolutional neural networks) 297-299

architecture of 297

disadvantages of 299

multi-task loss function in 298-299

Faster R-CNNs (region-based convolutional neural networks)

architecture of 300

base network to extract features 301-302

fully connected layers 306-307

multi-task loss function 307-308

object detection with 300-308

RPNs 302-306

anchor boxes 304-305

predicting bounding box with regressor 303-304

training 305-306

FC layer 208

FCNs (fully convolutional networks) 48, 120, 303

feature extraction 104, 301-302

automatically 31-33

handcrafted features 31-33

feature extractors 232, 244, 256-258, 297

feature maps 103-104, 108, 241, 243, 250, 252, 396-397

feature vector 18

feature visualizer 381-383

feature_layers 397


advantages of 33

handcrafted 31-33

learning 65-66, 252-253

overview 27-31

transferring 254

visualizing 377-381

feedforward process 62-66

learning features 65-66

FID (Fréchet inception distance) 357-358

filter hyperparameter 138

filter_index 381

filters 111-112

filters argument 117, 234

fine label 265

fine-tuning 258-259

advantages of 259

learning rates when 259

transfer learning 274-282

.fit() method 141

fit_generator() function 332

Flatten layer 95-96, 123, 208, 276

flattened vector 119

FLOPs (floating-point operations per second) 77

flow_from_directory() method 269, 276

foreground region 286, 306

FPS (frames per second) 289, 311

freezing layers 247

F-score 149

fully connected layers 101, 119, 212, 306-307


images as 19-20

training 369-370


gallery set 423

GANs (generative adversarial networks) 341-373, 430

applications for 359-362

image-to-image translation 360-361

Pix2Pix GAN 360-361

SRGAN 361-362

architecture of 343-356

DCGANs 345

generator models 348-350

minimax function 354-356

building 362-372

combined models 368-369

discriminators 367

downloading datasets 364

GANs (generative adversarial networks) (continued)

evaluating models of 357-359

choosing evaluation scheme 358-359

FID 358

inception score 358

generators 365-366

importing libraries 364

training 351-354, 370-372

discriminators 352

epochs 353-354

generators 352-353

training functions 369-370

visualizing datasets 364

generative models 342

generator models 348-350

generator_model function 349

generators 343, 351

in GANs 365-366

training 352-353

global average pooling 115

global minima 83

Google Open Images 267

GoogLeNet 217-229

architecture of 226-227

in Keras 225-229

building classifiers 229

building inception modules 228-229

building max-pooling layers 228-229

building network 227

learning hyperparameters in 229

GPUs (graphics processing units) 190, 207, 268, 296, 326, 372, 414, 441

gradient ascent 377

gradient descent (GD) 84-86, 155, 166-167, 184, 377

overview 78

with momentum 174-175

gradients function 382

gram matrix 396-397

graph transformer network 201


converting color images 23-26

images 154

ground truth bounding box 289-290, 305

GSTE (group-sensitive triplet embedding) 430


hard data mining 416

hard negative sample 417

hard positive sample 417

heaviside step function 54, 60

height value 122

hidden layers 46-47, 50, 62, 65, 119, 203

hidden units 111

high-recall model 149

human in the loop 162

human vision systems 5-6

hyperbolic tangent function 61



in AlexNet 210

in GoogLeNet 229

in Inception 229

in LeNet-5 202-203

in ResNet 238

in VGGNet 216

neural network hyperparameters 163-164

parameters vs. 163

tuning 162-165

collecting data vs. 162

neural network hyperparameters 163-164

parameters vs. hyperparameters 163


identity function 53, 60

if-else statements 30

image classification 10-11

for color images 133-144

compiling models 140-141

defining model architecture 137-140

evaluating models 144

image preprocessing 134-136

loading datasets 134

loading models with val_acc 143

training models 141-143

with CNNs 121-124

building model architecture 121-122

number of parameters 123-124

weights 123-124

with high accuracy 185-192

building model architecture 187-189

evaluating models 191-192

importing dependencies 185

preparing data for training 186-187

training models 190-191

with MLPs 93-102

drawbacks of 99-102

hidden layers 96

input layers 94-96

output layers 96

image classifier 18

image flattening 95

image preprocessing 33

image recommendation systems 15, 403

ImageDataGenerator class 181, 269, 276

ImageNet 265-266

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 204, 211, 224, 230, 266, 293

images 19-22

as functions 19-20

augmenting 156

color images 21-22

computer processing of 21

creating 13-14

grayscaling 154

preprocessing 23-26, 134-136

converting color to grayscale 23-26

one-hot encoding 135

preparing labels 135

splitting datasets for training 136

splitting datasets for validation 136

rescaling 135

resizing 154

image-to-image translation 360-361

Inception 217-229

architecture of 223-224

features of 217-218

learning hyperparameters in 229

modules 222-223

naive version 218-219

performance on CIFAR dataset 229

with dimensionality reduction 220-223

1 × 1 convolutional layer 220-221

impact on network performance 222

inception scores 358

inception_module function 225-226

include_top argument 247, 255, 394

input image 33, 385

input layers 46, 62

input vector 39

input_shape argument 122, 188, 237

instances 443

interpreting devices 8-10

IoU (intersection over union) 289-291, 319


Jaccard distance 432

joint training 401

Jupyter notebooks 443-444


K object classes 296

Kaggle datasets 267

Keras API

AlexNet in 207-210

batch normalization in 184

DeepDream in 387-391

GoogLeNet in 225-229

building classifiers 229

building inception modules 228-229

building max-pooling layers 228-229

building network 227

LeNet-5 in 200-201

ResNet in 235-237

keras.datasets 134 file 328

kernel 107

kernel size 112-113

kernel_size hyperparameter 138, 187


L2 regularization 177-179

label smoothing 432

labeled data 6

labeled images 11

LabelImg application 328

labels 135

lambda parameter 178

lambda value 207

layer_name 382

layers 47-48, 138

1 × 1 convolutional 220-221

dropout 179-180

adding to avoid overfitting 124-128

advantages of 126

in AlexNet 206

in CNN architecture 127-128

overview 125

fully connected 101, 119, 306-307

hidden 47

representing style features 396

Leaky ReLU 61-62, 165

learning 166-173

adaptive 170-171

embedding 406-407

features 65-66, 252-253

finding optimal learning rate 169-170


in AlexNet 210

in GoogLeNet 229

in Inception 229

in LeNet-5 202-203

in ResNet 238

in VGGNet 216

mini-batch size 171-173

See also transfer learning

learning curves, plotting 158-159, 191

batch gradient descent 79-80

decay 170-171

derivative and 80

optimal, finding 169-170

when fine-tuning 259

LeNet-5 199-203

architecture of 199

in Keras 200-201

learning hyperparameters in 202-203

on MNIST dataset 203

libraries in GANs 364

linear combination 40

linear datasets 45

linear decay 170

linear transfer function 53, 60

load_data() method 134

load_dataset() method 273, 280


data 331-332

datasets 134

environments 441

models 143

local minima 83

local response normalization 206

localization 12

localization module 293

locally connected layers 101

LocalResponseNorm layer 227

location loss 308

logistic function 55, 61


content loss 393-395

runtime analysis of 412-413

total variance 397

visualizing 334

loss functions 407-413

contrastive loss 410

cross-entropy loss 409-410

naive implementation 412-413

loss value 142-143, 191, 334

lr variable 169

lr_schedule function 202


MAC (multiplier-accumulator) 426

MAC operation 426

machine learning

human brain vs. 10

with handcrafted features 31

main path 233

make_blobs 160

matrices 67

matrix multiplication 67

max pooling 115-116, 200

max-pooling layers 228-229

mean absolute error (MAE) 71

mean average precision (mAP) 285, 289, 292, 317, 424, 427

mean squared error (MSE) 70-71

Mechanical Turk crowdsourcing tool, Amazon 266

metrics 140

min_delta argument 177

mini-batch gradient descent (MB-GD) 77, 84-85, 173, 238

mini-batch size 171-173

minimax function 354-356

mining data 414-423

BH 419

BS 421-423

BW 421

dataloader 414-416

finding useful triplets 416-419

mixed2 layer 389

mixed3 layer 389

mixed4 layer 389

mixed5 layer 389

MLPs (multilayer perceptrons) 45

architecture of 46-47

hidden layers 47

image classification with 93-102

drawbacks of 99-102

hidden layers 96

input layers 94-96

output layers 96

layers 47-48

nodes 47-48

MNIST (Modified National Institute of Standards and Technology) dataset 98, 101, 121, 203, 263, 354


architecture of 121-122, 137-140, 187-189

building 328, 330

compiling 140-141

configuring 329

designing 149-150

evaluating 144, 156-161, 191-192

building networks 159-161

diagnosing overfitting 156-158

diagnosing underfitting 156-158

evaluating networks 159-161

plotting learning curves 158-159, 191

training networks 159-161

loading 143

of GANs

choosing evaluation scheme 358-359

evaluating 357-359

FID 358

inception score 358

testing 427-431

object re-identification 428-431

retrievals 427-428

training 141-143, 190-191, 333

momentum, gradient descent with 174-175

monitor argument 177

MS COCO (Microsoft Common Objects in Context) 266-267

multi-scale detections 315-318

multi-scale feature layers 315-319

architecture of 318-319

multi-scale detections 315-318

multi-scale vehicle representation (MSVR) 431

multi-stage detectors 310

multi-task learning (MTL) 433

multi-task loss function 298-299, 307-308


naive implementation 412-413

naive representation 218-219, 222

n-dimensional array 67

neg_pos_ratio 330

networks 162-165, 222

architecture of 137-140

activation type 165

depth of neural networks 164-165

improving 164-165

width of neural networks 164-165

building 159-161

evaluating 159-161

improving 162-165

in Keras 227

measuring precision of 289

predictions 287-288


as classifiers 254-256

as feature extractors 256-258

to extract features 301-302

training 159-161, 397-398

neural networks 36-91

activation functions 51-60

binary classifier 54

heaviside step function 54

leaky ReLU 59-60

linear transfer function 53

logistic function 55

ReLU 58-59

sigmoid function 55

softmax function 57

tanh 58-59

backpropagation 86-90

covariate shift in 182-183

depth of 164-165

error functions 68-73

advantages of 69

cross-entropy 71-72

errors 72-73

MSE 70-71

overview 69

weights 72-73

feedforward process 62-66

learning features 65-66

hyperparameters in 163-164

learning features 252-253

multilayer perceptrons 45-51

architecture of 46-47

hidden layers 47

layers 47-48

nodes 47-48

optimization 74-77

optimization algorithms 74-86

batch gradient descent 77-83

gradient descent 85-86

MB-GD 84

stochastic gradient descent 83-84

overview 376-377

perceptrons 37-45

learning logic of 43

neurons 43-45

overview 38-42

width of 164-165

neural style transfer 12, 374, 392-399

content loss 393-395

network training 397-398

style loss 396-397

gram matrix for measuring jointly activated feature maps 396-397

multiple layers for representing style features 396

total variance loss 397

neurons 8, 38, 40, 43-45, 206

new_model 248

NMS (non-maximum suppression) 285, 288-289, 319

no free lunch theorem 163

node values 63

nodes 46-48

noise loss 393

nonlinear datasets 44-45

nonlinearities 51

non-trainable params 124

normalizing data 154-155, 186

nstaller/application.yaml file 440


oad_weights() method 143

object detection 12

framework 285-292

network predictions 287-288

NMS 288-289

object detection (continued)

object-detector evaluation metrics 289-292

region proposals 286-287

with Fast R-CNNs 297-299

architecture of 297

disadvantages of 299

multi-task loss function in 298-299

with Faster R-CNNs 300-308

architecture of 300

base network to extract features 301-302

fully connected layers 306-307

multi-task loss function 307-308

RPNs 302-306

with R-CNNs 283-297, 310-337

disadvantages of 296-297

limitations of 310

multi-stage detectors vs. single-stage detectors 310

training 296

with SSD 283-310, 319-337

architecture of 311-313

base networks 313-314

multi-scale feature layers 315-319

NMS 319

training SSD networks 326-335

with YOLOv3 283-320, 325-337

architecture of 324-325

overview 321-324

object re-identification 405-406, 424-426, 428-431

object-detector evaluation metrics 289-292

FPS to measure detection speed 289

IoU 289-291

mAP to measure network precision 289

PR CURVE 291-292

objectness score 285-286, 313, 322

octaves 385-386

offline training 414

one-hot encoding 135, 187

online learning 171

Open Images Challenge 267

open source datasets 262-267

CIFAR 264-265

Fashion-MNIST 264

Google Open Images 267

ImageNet 265-266

Kaggle 267


MS COCO 266-267

optimal weights 74

optimization 74-77

optimization algorithms 74-86, 174-177

Adam (adaptive moment estimation) 175

batch gradient descent 77-83

derivative 80

direction 79-80

gradient 79

learning rate 79-80

pitfalls of 82-83

step size 79-80

early stopping 175-177

gradient descent 85-86

overview 78

with momentum 174-175

MB-GD 84

number of epochs 175-177

stochastic gradient descent 83-84

optimization value 74

optimized weights 250

optimizer 352

output layer 40, 47, 62, 119

Output Shape columns 122


adding dropout layers to avoid 124-128

diagnosing 156-158

overview 125

regularization techniques to avoid 177-181

augmenting data 180-181

dropout layers 179-180

L2 regularization 177-179


padding 113-114, 118, 138, 212, 219

PAMTRI (pose aware multi-task learning) 430


calculating 123-124

hyperparameters vs. 163

non-trainable params 124

number of 123-124

overview 123

trainable params 124


non-trainable 124

trainable 124

PASCAL VOC-2012 dataset 293

Path-LSTM 431

patience variable 177

.pem file 442

perceptrons 37-45

learning logic of 43

neurons 43-45

overview 38-42

step activation function 42

weighted sum function 40

performance metrics 146-149

accuracy 147

confusion matrix 147-148

F-score 149

performance metrics (continued)

person re-identification 406

pip install 444

Pix2Pix GAN (generative adversarial network) 360-361

plot_generated_images() function 370

plotting learning curves 158-159, 191

POOL layer 208

POOL_1 layer 123

POOL_2 layer 123

pooling layers 114-118, 203, 217

convolutional layers 117-118

max pooling vs. average pooling 115-116

PR CURVE (precision-recall curve) 291-292

precision 148, 289

predictions 335

across different scales 323-324

bounding box with regressors 303-304

for networks 287-288

with base network 314


data 153-156

augmenting images 156

grayscaling images 154

normalizing data 154-155

resizing images 154

images 23-26, 134-136

converting color images to grayscale images 23-26

one-hot encoding 135

preparing labels 135

splitting datasets for training 136

splitting datasets for validation 136

pretrained model 244

pretrained networks

as classifiers 254-256

as feature extractors 256-258, 268-274

priors 314


query sets 423

Quick, Draw! dataset, Google 372


R-CNNs (region-based convolutional neural networks)

disadvantages of 296-297

limitations of 310

multi-stage detectors vs. single-stage detectors 310

object detection with 283-297, 310-337

training 296

receptive field 108, 213

reduce argument 235

reduce layer 220

reduce shortcut 234

region proposals 286-287

regions of interest (RoIs) 285-286, 293, 295-297, 306, 310

regression layer 299

regressors 303-304

regular shortcut 234

regularization techniques to avoid overfitting 177-181

augmenting data 180-181

dropout layers 179-180

L2 regularization 177-179

ReLU (rectified linear unit) 58-59, 61-62, 96, 106, 111, 118, 139, 151, 160, 165, 188, 199, 203, 205, 209, 212, 231, 366

activation functions 205

leaky 59-60

rescaling images 135

residual blocks 232-235

residual module architecture 230

residual notation 71

resizing images 154

ResNet (Residual Neural Network) 230-238

features of 230-233

in Keras 235-237

learning hyperparameters in 238

performance on CIFAR dataset 238

residual blocks 233-235

results, observing 370-372

retrievals 427-428

RGB (Red Green Blue) 21, 360

RoI extractor 297

RoI pooling layer 297, 300, 308

RoIs (regions of interest) 285-286, 293, 295-297, 306, 310

RPNs (region proposal networks) 302-306

anchor boxes 304-305

predicting bounding box with regressors 303-304

training 305-306

runtime analysis of losses 412-413


s argument 235

save_interval 369

scalar 67

scalar multiplication 67

scales, predictions across 323-324

scipy.optimize.fmin_l_bfgs_b method 398

sensing devices 7

shortcut path 233

Siamese loss 410

sigmoid function 55, 61, 63, 205

single class 306

single-stage detectors 310

skip connections 230-231

Softmax layer 208, 248, 297

source domain 254

spatial features 99-100

specificity 148


data 151-153


for training 136

for validation 136

SRGAN (super-resolution generative adversarial networks) 361

SSD (single-shot detector)

architecture of 311-313

base network 313-314

multi-scale feature layers 315-319

architecture of multi-scale layers 318-319

multi-scale detections 315-318

non-maximum suppression 319

object detection with 283-310, 319-337

training networks 326-335

building models 328

configuring models 329

creating models 330

loading data 331-332

making predictions 335

training models 333

visualizing loss 334

SSDLoss function 330

ssh command 443

StackGAN (stacked generative adversarial network) 359

step activation function 42

step function 38

step functions. See heaviside step function

step size 79-80

stochastic gradient descent (SGD) 77, 83-85, 171, 427

strides 113-114

style loss 396-397

gram matrix for measuring jointly activated feature maps 396-397

multiple layers for representing style features 396

style_loss function 393, 396

style_weight parameter 395

subsampling 114-118

supervised learning 6

suppression. See NMS (non-maximum suppression)

synapses 38

synset (synonym set) 204, 266


tanh (hyperbolic tangent function) 58-59

tanh activation function 200, 205

Tensorflow playground 165

tensors 67

testing trained model 427-431

object re-identification 428-431

retrievals 427-428

test_path variable 270

test_targets 280

test_tensors 280

TN (true negatives) 147

to_categorical function 160, 187

top-1 error rate 211

top-5 error rate 211, 216

top-k accuracy 427

total variance loss 397

total variation loss 393

total_loss function 393

total_variation_loss function 397

total_variation_weight parameter 395

TP (true positives) 147, 289

train() function 369-370

trainable params 124

train_acc value 152, 162


AlexNet 207

by trial and error 6

discriminators 352

embedding networks 423-431

finding similar items 424

implementation 426

object re-identification 424-426

testing trained models 427-431

epochs 353-354

functions 369-370

GANs 351-354, 370-372

generators 352-353

models 141-143, 190-191, 333

networks 159-161, 397-398

training (continued)

preparing data for 151-156, 186-187

augmenting data 187

normalizing data 186

one-hot encode labels 187

preprocessing data 153-156

splitting data 151-153

R-CNNs 296

RPNs 305-306

splitting datasets for 136

SSD networks 326-335

building models 328

configuring models 329

creating models 330

loading data 331-332

making predictions 335

training models 333

visualizing loss 334

train_loss value 152

train_on_batch method 352

train_path variable 270

transfer functions 51

in GANs 369-370

linear 53

transfer learning 150, 240-282

approaches to 254-259

using pretrained network as classifier 254-256

using pretrained network as feature extractor 256-258

choosing level of 260-262

when target dataset is large and different from source dataset 261

when target dataset is large and similar to source dataset 261

when target dataset is small and different from source 261

when target dataset is small and similar to source dataset 260-261

fine-tuning 258-259, 274-282

open source datasets 262-267

CIFAR 264-265

Fashion-MNIST 264

Google Open Images 267

ImageNet 265-266

Kaggle 267


MS COCO 266-267

overview 243-254

neural networks learning features 252-253

transferring features 254

pretrained networks as feature extractors 268-274

when to use 241-243

transferring features 254

transposition 68

triplets, finding 416-419

tuning hyperparameters 162-165

collecting data vs. 162

neural network hyperparameters 163-164

parameters vs. hyperparameters 163


underfitting 125, 143, 156-158

untrained layers 248

Upsampling layer 350

Upsampling2D layer 348


val_acc 143

val_acc value 142-143, 152, 162

val_error value 157, 176

validation datasets

overview 152

splitting 136

valid_path variable 270

val_loss value 142-143, 152, 169, 191, 334

VAMI (viewpoint attentive multi-view inference) 430

vanishing gradients 62, 230

vector space 67, 401

VeRi dataset 424-425, 428, 430

VGG16 configuration 213, 215-216, 311, 313, 381

VGG19 configuration 213-214, 314

VGGNet (Visual Geometry Group at Oxford University) 212-216

configurations 213-216

features of 212-213

learning hyperparameters in 216

performance 216

vision systems 5-6

AI 6

human 5-6

visual embedding layer 401

visual embeddings 400-436

face recognition 402

image recommendation systems 403

learning embedding 406-407

loss functions 407-413

contrastive loss 410

cross-entropy loss 409-410

naive implementation 412-413

runtime analysis of losses 412-413

visual embeddings (continued)

mining informative data 414-423

BH 419

BS 421-423

BW 421

dataloader 414-416

finding useful triplets 416-419

training embedding networks 423-431

finding similar items 424

implementation 426

object re-identification 424-426

testing trained models 427-431

visual perception 5


datasets 364

features 377-381

loss 334

VUIs (voice user interfaces) 4


warm-up learning rate 432

weight connections 46

weight decay 178

weight layers 199

weight regularization 207

weighted sum 38

weighted sum function 40

weights 72-73, 123-124

calculating parameters 123-124

non-trainable params 124

trainable params 124

weights vector 39

width value 122


X argument 234

x_test 186

x_train 186

x_valid 186


YOLOv3 (you only look once)

architecture of 324-325

object detection with 283-320, 325-337

overview 321-324

output bounding boxes 324

predictions across different scales 323-324


zero-padding 114

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.