Index

Symbols

Sanger's rule 416

A

activation function 500

activation function, Multilayer Perceptron (MLP)

about 509

hyperbolic tangent 509, 510

rectifier activation function 510, 511, 512

sigmoid 509, 510

softmax 512

AdaBoost

about 456, 457, 458, 459, 460

example, with scikit-learn 468, 469, 470, 471, 473

AdaBoost.M1 456

AdaBoost.R2 465, 466, 467, 468

AdaBoost.SAMME 460, 461

AdaBoost.SAMME.R 462, 463, 464

AdaDelta

about 539, 540

using, with TensorFlow/Keras 540

AdaGrad

using, with TensorFlow/Keras 538

Adaptive Moment Estimation (Adam)

about 536

in TensorFlow/Keras 537

adjacency matrix 134

adjusted Rand index

about 197, 198

adversarial training

about 635, 636, 637, 638, 639

affinity matrix 134

AIC

used, for determining optimal number of components 366, 367

anti-Hebbian 423

approaches, ensemble learning

bagging (bootstrap aggregating) 441

boosting 441

stacking 442

Apriori Algorithm, in Python

example 242, 243, 244, 246

AR 293, 295, 296, 297

ARIMA model 298

ARMA

about 293, 294, 295, 296, 297

used, for modeling non-stationary trend models 297, 298, 299

artificial neuron 499, 500

assumptions, semi-supervised learning

cluster 71

manifold 72, 73, 74

smoothness 70

asymptotically unbiased 33

atrous convolution 567, 568

autocorrelation function 290, 291, 292, 293

autoencoders

about 609, 610, 611

autonomous noise component 248

B

backpropagation algorithm

about 512, 513, 514, 515

MLP, example with Keras 522, 523, 525, 526, 527

MLP, example with Tensorflow 522, 523, 525, 526, 527

stochastic gradient descent (SGD) 515, 517, 518

weight initialization 518, 520, 521

Back-Propagation Through Time (BPTT)

about 589, 590

limitations 591, 592

bagging (bootstrap aggregating) 441

ball trees

about 181

batch normalization

about 551, 552, 553

with Keras, example 554, 555, 556

with TensorFlow, example 554, 555, 556

Bayes accuracy 35

Bayes error 35

Bayesian Gaussian Mixture

used, for automatic component selection 368, 369

Bayesian Information Criterion (BIC)

used, for determining optimal number of components 366, 367

Bayesian networks

about 306, 307

direct sampling 308, 309

direct sampling, example 309, 310

sampling from 307

sampling, PyMC3 used 317

Bayes' theorem 302, 303

Best Unbiased Linear Estimation (BLUE) 253

between-cluster-dispersion (BCD) 225

bias of an estimator 32, 33

bias-variance 55

biclustering 234, 235

bidimensional discrete convolution 561, 562, 563, 564, 565

bidimensional discrete convolution, parameters

padding 565, 566

strides 565, 566

binary decision trees 442

Binomial Negative Log-Likelihood Loss 482

body-mass index (BMI) 264

boosting approach 441, 442

bootstrap samples 447

bootstrap sampling 441

Broyden-Fletcher-Goldfarb-Shanno (BFGS) 109

C

Calinski-Harabasz score 225, 226

candidate-generating distribution 315

Categorical cross-entropy 51, 52

characteristics, machine learning model

about 27

bias of an estimator 32

capacity 29, 30

variance of estimator 37

CIFAR-10 dataset

reference link

class rebalancing 157

coefficient of determination 255

completeness score

about 197

conditional probabilities 302, 303

conjugate priors 303, 304, 305, 306

Constant Error Carousel (CEC) 592

Constrained optimization by linear approximation (COBYLA) algorithm 114, 118

Contrastive Divergence (CD-k) 665, 666

Contrastive Pessimistic Likelihood Estimation (CPLE) algorithm

about 103, 104

example 106, 107, 108, 109

summary 110

theory 104, 105

convolutional operators

about 559, 560, 561

atrous convolution 567, 568

bidimensional discrete convolution 561, 562, 563, 564, 565

separable convolution 568, 569

transpose convolution 570

cost function

about 47, 49

defining 45, 46, 47, 48, 49

cost function, examples

about 49

Categorical cross-entropy 51, 52

Hinge cost function 50, 51

Huber cost function 50

mean squared error 50

Co-Training

about 94

example, with Wine dataset 96, 97, 98, 99, 100

summary 100

theory 94, 95, 96

covariance rule

analysis 409, 410, 411, 412

application, example 412, 413

Cramér-Rao bound 39, 40, 41, 42

cropping layers 574

cross-entropy 504

cross-validation (CV) technique 20, 21, 22, 23, 24, 25, 26

D

data augmentation 581, 583, 584, 585, 586

datasets

properties 3

structure 3

datasets scaling

about 7, 8

normalization 12

range scaling 8

robust scaling 8, 9, 10, 11

Davies-Bouldin score 226, 227

decision trees

about 445, 446

feature importance 453, 454, 455, 456

Deep Belief Network (DBN)

about 666, 667, 668, 669

reference link 669

deep convolutional autoencoder

with TensorFlow, example 612, 613, 616, 617, 619, 620

Deep Convolutional GAN (DCGAN)

about 640

collapsing 647, 648

with TensorFlow, example 640, 641, 643, 645, 647

deep convolutional network

about 557, 558, 559

example, with Keras 574, 575, 576, 578, 579, 581, 582, 583, 584, 585, 586

example, with TensorFlow 574, 575, 578, 579, 581, 582, 583, 584, 585, 586

degree matrix 135, 215

denoising autoencoders

about 620, 621

with TensorFlow, example 621, 622, 623

Density-Based Spatial Clustering of Applications with Noise)

results, analysis 227, 228, 230

Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

about 220, 221, 222, 223

example, with scikit-learn 223, 225

detailed balance 312

Dijkstra algorithm 158

Discrete AdaBoost 456

discrete Laplacian operator 562

domain adaptation 5

dropout

about 540, 543

using, with TensorFlow and Keras 544, 545, 546, 548, 549, 551

E

early stopping 62, 63

ElasticNet 62

ELBO (short for evidence lower bound) 629

EM Algorithm

about 350, 351, 352

convex functions 352, 353, 354

Jensen�s inequality, applying 355, 356

empirical risk 47, 112

ensemble learning

about 439, 440, 441

approaches 441

as model selection 497, 498

evaluation metrics

about 194, 195, 196

expected risk 46

Explainable AI (XAI) 455

exponential smoothing 288

extra-randomized trees 448

F

factor analysis (FA)

about 373, 374

example, with scikit-learn 378, 379, 380, 381

linear relation analysis 375, 376, 377

Factor Analysis (FA) 471

factor loading matrix 374

Fashion MNIST deep convolutional autoencoder

sparseness, adding 624, 627

FastICA

example, with scikit-learn 397, 398, 399

feature selection 59, 455

Fisher information 39

flattening layers 574

forward-backward algorithm

about 332

backward phase 334, 335

forward phase 332, 333, 334

HMM parameter estimation 335, 336, 337

Fuzzy C-means

about 205, 206, 207, 208, 209

example,with SciKit-Fuzzy 210, 211, 212

G

Gated Recurrent Unit (GRU) model 597, 599, 600

GaussianDropout

reference link 551

Gaussian Mixture

about 359, 360, 361, 362, 363

example, with scikit-learn 363, 364, 365

optimal number of components, determining with AIC or BIC 366, 367

Generalized Hebbian Learning (GHL) 415

Generalized Hebbian Rule (GHA) 415

Generalized Linear Models (GLMs)

about 247, 248

Least Squares Estimation 249

Generative Gaussian Mixture

about 74

example 77, 78, 79, 80, 81, 82, 84

summary 84

theory 74, 75, 76, 77

weighted log-likelihood 84, 85, 86

Gibbs sampling 312, 313, 314

Gini importance 453

Gini impurity 444

gradient boosting

about 477, 478, 479, 480, 481

example, with XGBoost 486, 487, 488, 489, 490

loss functions 481, 482, 483

gradient tree boosting

example, with scikit-learn 483, 484, 486

graph Laplacian 135

H

Harmonium 662

Hebb's rule

about 403, 404, 405, 406, 407, 408, 409

heteroscedastic 249

Hidden Markov Models (HMMs) 351

about 330, 331, 332, 399

forward-backward algorithm 332

Viterbi algorithm 340

Hinge cost function 50, 51

hmmlearn

used, for finding hidden state sequence 341, 342, 343, 344

used, in HMM training 338, 339

HMM training

example, with hmmlearn 337, 338, 339

homogeneity score

about 196

homoscedastic 249

H-spread 9

Huber cost function 50

Huber loss

used, for increasing outlier robustness 260, 261, 262

hyperbolic tangent 509, 510

I

independent and identically distributed (i.i.d.) 5

Independent Component Analysis (ICA)

about 394, 395, 396, 397

inductive learning 69

Interquartile Range (IQR) 9

Isomap

about 158, 159

example 160, 161, 162

isotonic regression

about 284

examples 284

J

Jensen�s inequality

EM Algorithm, applying 355, 356

in EM Algorithm 352, 353, 354

K

k-dimensional (k-d) trees

about 180

Keras

deep convolutional network, example 574, 575, 576, 578, 579

MLP, example 522, 523, 525, 526

URL 522

Keras/Tensorflow implementation

reference link 574

kernel PCA 389, 390, 391, 392

K-Fold cross-validation

about 20, 21

Leave-one-out (LOO) 22

Leave-P-out (LPO) 22

Stratified K-Fold 21

K-means

about 186, 187, 188, 189, 190

K-means++

about 190, 191

K-means, with scikit-learn

examples 192, 193, 194

K-Nearest Neighbor (KNN) 214

k-Nearest Neighbors 135

K-Nearest Neighbors (KNN)

about 175, 176, 177, 178, 179

model, fitting 182, 183

KNN, with scikit-learn

examples 183, 185, 186

Kohonen Maps

about 430, 431, 432

L

L1 or Lasso regularization 57, 58, 59, 60, 61

L2 or Ridge regularization 55, 56, 57

label propagation

example 137, 138, 139, 140, 141

in scikit-learn 141, 142, 143

label propagation, based on Markov random walks

about 152, 153, 154

example 154, 156, 157

label propogation

about 134, 135

steps 136, 137

label spreading

about 144, 145

example 147, 148

label spreading algorithm

steps 146

Laplacian Regularization

smoothness, increasing 148, 149, 150, 151, 152

Laplacian Spectral Embedding

about 166, 167

example 167, 168

lasso regression

example 270, 271, 272, 273

used, for risk modeling 268, 269, 270

Latent Dirichlet Allocation (LDA) 348

layers

about 573

Least Squares Estimation

about 249, 250, 251, 252

bias 252, 253

variance 252, 253

Leave-one-out (LOO) 22

Leave-One-Out (LOO) 276

Leave-P-out (LPO) 22

Leave-P-Out (LPO) 276

Leptokurtotic (super-Gaussian) 395

LIBSVM

reference link 114

limited sample populations 4, 5, 6

linear models

for time-series 289, 290

linear regression confidence intervals

computing, with Statsmodels 257, 258, 259

linear regression, with Python

example 254, 255, 256

Locally Linear Embedding (LLE)

about 162, 163, 164

example 164, 165

logistic regression

example 270, 271, 272, 273

used, for risk modeling 268, 269, 270

Long Short-Term Memory (LSTM)

about 592, 593, 594, 595, 596, 597

Gated Recurrent Unit (GRU) model 597

with TensorFlow and Keras, example 601, 602, 603, 604, 605

long-term depression (LTD) 408

loss function

about 46

defining 45, 46, 47, 48, 49

loss functions

for gradient boosting 481, 482, 483

M

MA 293, 294, 295, 296, 297

machine learning models

characteristics 26

working, with data 2, 3

manifold learning

about 158

Market Basket Analysis

with Apriori Algorithm 240, 241, 242

Markov chain Monte Carlo (MCMC) 301

Markov Chains 310, 311, 312

Markov random field (MRF) 659, 660, 661

Markov random walks

label propagation 152, 153, 154

Maximum A Posteriori (MAP) 347, 348, 349, 350

Maximum Likelihood Estimation (MLE) 347, 348, 350, 409

mean absolute error (MAE) 260, 617

mean squared error 50

mean squared error (MSE) 610

metric multidimensional scaling 159

Metropolis-Hastings algorithm

about 314, 315

example 316, 317

Mexican Hat 429

mini-batch gradient descent 518

Modified LLE (MLLE) 163

momentum 532

Multilayer Perceptron (MLP) 508, 509

example, with Keras 522, 523, 525, 526

example, with Tensorflow 522, 523, 525, 526

Multilayer Perceptrons (MLP) 30

Multinomial Negative Log-Likelihood Loss 482

N

Natural Language Processing (NLP) 13

Nesterov momentum 532, 533

NLopt

reference link 114

non-parametric models 3

non-stationary trend models

modeling, with ARMA 298, 299

normalization 12, 13, 14

Numba

reference link 421

O

Occam's razor principle 32, 42

Oja's rule 414

One-vs-All approach 5

optimization algorithms

about 529, 530, 531

gradient perturbation 531

Ordinary Least Squares (OLS) 57, 249

Ordinary or Generalized Least Square algorithms 50

overfitting 33, 37

P

padding 565, 566

padding layers 574

parameter estimation

example 356, 357, 358, 359

parametric learning process 3

parametric models 3

Pasting approach 441

perceptron 499, 501, 502, 503, 504

example, with scikit-learn 504, 506, 507

Perceptrons 30

Platt scaling 104

Platykurtotic (sub-Gaussian) 395

polynomial regression

about 273, 275, 276

examples 277, 279, 281, 282, 283, 284

pooling layers

about 570, 571, 573

Principal Component Analysis (PCA) 159, 411, 471

about 382, 383

example, with scikit-learn 387, 388

importance evaluation 384, 385, 386

kernel PCA 389, 390, 391, 392

Sparse PCA 392, 393

probability density function (p.d.f.) 303

proper Bagging 441

pseudo-probability vectors 595

PyMC3

reference link 319

sampling process, executing 321, 323, 324

used, for sampling 317, 318, 320, 321

PyMC3 API

reference link 318

PyStan

reference link 327

used, for sampling 325, 327, 328, 329, 330

Python

supervised DBN, example 672, 673, 674

unsupervised DBN, example 669, 670, 671, 672

Q

quasi-noiseless scenarios 386

R

radial basis function kernel 135

Radial basis function (RBF) 214

radial basis function (RBF) kernel 390

random forest

about 442, 444, 445

and bias-variance trade-off 446, 447, 448, 449

example, with scikit-learn 449, 450, 451, 452, 453

feature importance 453, 454, 455, 456

range scaling 8

Rayleigh-Ritz method 164

real-world evidence (RWE) 271

rectifier activation function 510, 511, 512

recurrent networks

about 587, 588, 589

Back-Propagation Through Time (BPTT) 589, 590

recurrent neural networks (RNNs) 587

regression techniques

about 263

isotonic regression 284

lasso regression 268

logistic regression 268

polynomial regression 273

ridge regression 263

regularization 39, 53, 54

about 540, 541

using, in TensorFlow and Keras 542, 543

regularization techniques, examples

about 55

early stopping 62, 63

ElasticNet 62

L1 or Lasso regularization 57, 58, 59, 60, 61

L2 or Ridge regularization 55, 56, 57

representational capacity 29

residual 249

Restricted Boltzmann Machine (RBM) 662, 663, 664

ridge regression

about 263

example, with scikit-learn 264, 265, 266, 267, 268

RMSProp

about 534, 535

in TensorFlow/Keras 535

robust scaling 8, 9, 10, 11

Rubner-Tavan's network

about 421, 422, 423, 424, 425

example 425, 427

S

saddle points 48

Sanger's network

about 415, 416, 417, 418

example 418, 419, 420, 421

SciKit-Fuzzy

reference link 210

used, with example of Fuzzy C-means 210, 211, 212

scikit-learn

gradient tree boosting, example 483, 484, 486

label propagation 141, 142, 143

used, with example of Density-Based Spatial Clustering of Applications with Noise) 223, 224

used, with example of spectral clustering 217, 219, 220

voting classifiers, example 495, 496, 497

scikit-learn classes

MinMaxScaler 10

RobustScaler 10

StandardScaler 10

self-organizing maps (SOMs)

about 428, 429

example 432, 433, 434, 435, 436

Kohonen Maps 430, 431, 432

self-training

example, with Iris dataset 90, 91, 92, 93

summary 93

theory 87, 88, 89

self-training

about 87

semi-supervised learning

about 65

assumptions 70

scenarios 66

semi-supervised learning, scenarios

about 66, 67

causal scenarios 67, 68

inductive learning 69

transductive learning 69

semi-supervised Support Vector Machines (S3VM)

about 110

implementing, in Python 114, 115, 116, 117, 118, 119

summary 119

theory 110, 111, 112, 113

separable convolution 568, 569

Sequential Least Squares Programming (SLSQP) 118

SHAP

about 490

reference link 490

shattering 30

sigmoid 509, 510

Silhouette score

about 198, 199, 200, 201

Singular Value Decomposition (SVD) 16, 234, 382

smoothness

increasing, with Laplacian Regularization 148, 149, 150, 151, 152

softmax 512

softmax function 5

sparse autoencoders

about 623

sparse coding 59

Sparse PCA 392, 393

Spectral Biclustering, with scikit-learn

example 236, 237, 238, 239

spectral clustering 213

about 214, 215, 216

example, with scikit-learn 217, 219, 220

stacking approach 442

Stagewise Additive Modeling using a Multi-class Exponential (SAMME) 460

state-of-the-art models 586

Statsmodels

used, for computing linear regression confidence intervals 257, 258, 259

stochastic gradient descent (SGD) 50, 516, 517, 518

Stochastic Gradient Descent (SGD)

about 529

with Momentum in TensorFlow/Keras 533, 534

Stratified K-Fold 21

strides 565, 566

supervised DBN

example, in Python 672, 673, 674

Support Vector Machines (SVMs) 7

Support Vector Machine (SVM) 50, 110, 669

Swish function 512

T

t-distributed stochastic neighbor embedding

example 169, 171, 172

Tensorflow

MLP, example 522, 523, 525, 526

URL 522

TensorFlow

deep convolutional autoencoder, example 612, 613, 614, 616, 617, 619, 620

deep convolutional network, example 574, 575, 576, 578, 579

denoising autoencoder, example 621, 622, 623

TensorFlow/Keras

AdaDelta, using 539

AdaGrad, using 538

Adam, using 537

batch normalization, using 554, 555

dropout, using 544, 545, 546, 549, 551

LSTM, using with 601, 602, 603, 604, 605

regularization, using 541, 542, 543

RMSProp, using 535

SGD with Momentum 533

test set 17, 18, 19

Tikhonov regularization 55

time-series

about 285

linear models 289, 290

smoothing procedure 287, 288, 289

working with 286, 287

training set 17, 18, 19

transductive learning 69

Transductive Support Vector Machines (TSVM)

about 120

configuration analyzing 126, 127, 128, 129

implementing, in Python 121, 122, 123, 124, 125, 126

summary 130

theory 120, 121

transfer learning 605, 606, 607, 608

transpose convolution 570

Truncated Backpropagation Through Time (TBPTT) 590

t-SNE

about 168, 169

U

unbiased estimator 32

underfitting 33, 34, 35, 36

unsupervised DBN

example, in Python 669, 670, 671, 672

upsampling layers 574

V

validation set 17, 18, 19

valid padding 565

Vapnik-Chervonenkis-capacity (VC-capacity) 30, 31

Vapnik-Chervonenkis theory 30, 31

Vapnik�s principle 69

variance of an estimator

about 37

Cramér-Rao bound 39, 40, 41, 42

overfitting 37, 38, 39

variance scaling 520

variational autoencoder (VAE)

about 627, 628, 629, 630

example, with TensorFlow 630, 633, 634

VC-dimension 31

Viterbi algorithm

about 340, 341

used, for finding hidden state sequence 341, 342, 343, 344

voting classifiers

ensemble 493, 494, 495

example, with scikit-learn 495, 496, 497

W

Wasserstein GAN

about 649, 650, 651, 652

with TensorFlow, example 652, 654, 655, 656

weak learners 440

Weighted Least Squares (WLS) 251

weight initialization 518, 520, 521

weight shrinkage 55

weight vector stabilization 414

whitening

about 15, 16

advantages 15

versus original dataset 16

within-cluster-dispersion (WCD) 225

X

Xavier initialization 521

XGBoost

features, evaluating 490, 491, 493

gradient boosting, example 486, 487, 488, 489, 490

reference link 486

Z

zero-centering 7

z-score 7

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.141.6