Index
Symbols
Sanger's rule 416
A
activation function 500
activation function, Multilayer Perceptron (MLP)
about 509
rectifier activation function 510, 511, 512
softmax 512
AdaBoost
example, with scikit-learn 468, 469, 470, 471, 473
AdaBoost.M1 456
AdaBoost.R2 465, 466, 467, 468
AdaBoost.SAMME.R 462, 463, 464
AdaDelta
using, with TensorFlow/Keras 540
AdaGrad
using, with TensorFlow/Keras 538
Adaptive Moment Estimation (Adam)
about 536
in TensorFlow/Keras 537
adjacency matrix 134
adjusted Rand index
adversarial training
affinity matrix 134
AIC
used, for determining optimal number of components 366, 367
anti-Hebbian 423
approaches, ensemble learning
bagging (bootstrap aggregating) 441
boosting 441
stacking 442
Apriori Algorithm, in Python
ARIMA model 298
ARMA
used, for modeling non-stationary trend models 297, 298, 299
assumptions, semi-supervised learning
cluster 71
smoothness 70
asymptotically unbiased 33
autocorrelation function 290, 291, 292, 293
autoencoders
autonomous noise component 248
B
backpropagation algorithm
MLP, example with Keras 522, 523, 525, 526, 527
MLP, example with Tensorflow 522, 523, 525, 526, 527
stochastic gradient descent (SGD) 515, 517, 518
weight initialization 518, 520, 521
Back-Propagation Through Time (BPTT)
bagging (bootstrap aggregating) 441
ball trees
about 181
batch normalization
with Keras, example 554, 555, 556
with TensorFlow, example 554, 555, 556
Bayes accuracy 35
Bayes error 35
Bayesian Gaussian Mixture
used, for automatic component selection 368, 369
Bayesian Information Criterion (BIC)
used, for determining optimal number of components 366, 367
Bayesian networks
direct sampling, example 309, 310
sampling from 307
sampling, PyMC3 used 317
Best Unbiased Linear Estimation (BLUE) 253
between-cluster-dispersion (BCD) 225
bias-variance 55
bidimensional discrete convolution 561, 562, 563, 564, 565
bidimensional discrete convolution, parameters
binary decision trees 442
Binomial Negative Log-Likelihood Loss 482
body-mass index (BMI) 264
bootstrap samples 447
bootstrap sampling 441
Broyden-Fletcher-Goldfarb-Shanno (BFGS) 109
C
Calinski-Harabasz score 225, 226
candidate-generating distribution 315
Categorical cross-entropy 51, 52
characteristics, machine learning model
about 27
bias of an estimator 32
variance of estimator 37
CIFAR-10 dataset
reference link
class rebalancing 157
coefficient of determination 255
completeness score
about 197
conditional probabilities 302, 303
conjugate priors 303, 304, 305, 306
Constant Error Carousel (CEC) 592
Constrained optimization by linear approximation (COBYLA) algorithm 114, 118
Contrastive Divergence (CD-k) 665, 666
Contrastive Pessimistic Likelihood Estimation (CPLE) algorithm
summary 110
convolutional operators
bidimensional discrete convolution 561, 562, 563, 564, 565
separable convolution 568, 569
transpose convolution 570
cost function
cost function, examples
about 49
Categorical cross-entropy 51, 52
Huber cost function 50
mean squared error 50
Co-Training
about 94
example, with Wine dataset 96, 97, 98, 99, 100
summary 100
covariance rule
Cramér-Rao bound 39, 40, 41, 42
cropping layers 574
cross-entropy 504
cross-validation (CV) technique 20, 21, 22, 23, 24, 25, 26
D
data augmentation 581, 583, 584, 585, 586
datasets
properties 3
structure 3
datasets scaling
normalization 12
range scaling 8
decision trees
feature importance 453, 454, 455, 456
Deep Belief Network (DBN)
reference link 669
deep convolutional autoencoder
with TensorFlow, example 612, 613, 616, 617, 619, 620
Deep Convolutional GAN (DCGAN)
about 640
with TensorFlow, example 640, 641, 643, 645, 647
deep convolutional network
example, with Keras 574, 575, 576, 578, 579, 581, 582, 583, 584, 585, 586
example, with TensorFlow 574, 575, 578, 579, 581, 582, 583, 584, 585, 586
denoising autoencoders
with TensorFlow, example 621, 622, 623
Density-Based Spatial Clustering of Applications with Noise)
results, analysis 227, 228, 230
Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
example, with scikit-learn 223, 225
detailed balance 312
Dijkstra algorithm 158
Discrete AdaBoost 456
discrete Laplacian operator 562
domain adaptation 5
dropout
using, with TensorFlow and Keras 544, 545, 546, 548, 549, 551
E
ElasticNet 62
ELBO (short for evidence lower bound) 629
EM Algorithm
convex functions 352, 353, 354
Jensen�s inequality, applying 355, 356
ensemble learning
approaches 441
evaluation metrics
expected risk 46
Explainable AI (XAI) 455
exponential smoothing 288
extra-randomized trees 448
F
factor analysis (FA)
example, with scikit-learn 378, 379, 380, 381
linear relation analysis 375, 376, 377
Factor Analysis (FA) 471
factor loading matrix 374
Fashion MNIST deep convolutional autoencoder
FastICA
example, with scikit-learn 397, 398, 399
Fisher information 39
flattening layers 574
forward-backward algorithm
about 332
HMM parameter estimation 335, 336, 337
Fuzzy C-means
example,with SciKit-Fuzzy 210, 211, 212
G
Gated Recurrent Unit (GRU) model 597, 599, 600
GaussianDropout
reference link 551
Gaussian Mixture
example, with scikit-learn 363, 364, 365
optimal number of components, determining with AIC or BIC 366, 367
Generalized Hebbian Learning (GHL) 415
Generalized Hebbian Rule (GHA) 415
Generalized Linear Models (GLMs)
Least Squares Estimation 249
Generative Gaussian Mixture
about 74
example 77, 78, 79, 80, 81, 82, 84
summary 84
weighted log-likelihood 84, 85, 86
Gini importance 453
Gini impurity 444
gradient boosting
example, with XGBoost 486, 487, 488, 489, 490
gradient tree boosting
example, with scikit-learn 483, 484, 486
graph Laplacian 135
H
Harmonium 662
Hebb's rule
about 403, 404, 405, 406, 407, 408, 409
heteroscedastic 249
Hidden Markov Models (HMMs) 351
forward-backward algorithm 332
Viterbi algorithm 340
hmmlearn
used, for finding hidden state sequence 341, 342, 343, 344
used, in HMM training 338, 339
HMM training
example, with hmmlearn 337, 338, 339
homogeneity score
about 196
homoscedastic 249
H-spread 9
Huber cost function 50
Huber loss
used, for increasing outlier robustness 260, 261, 262
I
independent and identically distributed (i.i.d.) 5
Independent Component Analysis (ICA)
inductive learning 69
Interquartile Range (IQR) 9
Isomap
isotonic regression
about 284
examples 284
J
Jensen�s inequality
EM Algorithm, applying 355, 356
K
k-dimensional (k-d) trees
about 180
Keras
deep convolutional network, example 574, 575, 576, 578, 579
MLP, example 522, 523, 525, 526
URL 522
Keras/Tensorflow implementation
reference link 574
K-Fold cross-validation
Leave-one-out (LOO) 22
Leave-P-out (LPO) 22
Stratified K-Fold 21
K-means
K-means++
K-means, with scikit-learn
K-Nearest Neighbor (KNN) 214
k-Nearest Neighbors 135
K-Nearest Neighbors (KNN)
KNN, with scikit-learn
Kohonen Maps
L
L1 or Lasso regularization 57, 58, 59, 60, 61
L2 or Ridge regularization 55, 56, 57
label propagation
example 137, 138, 139, 140, 141
label propagation, based on Markov random walks
label propogation
label spreading
label spreading algorithm
steps 146
Laplacian Regularization
smoothness, increasing 148, 149, 150, 151, 152
Laplacian Spectral Embedding
lasso regression
used, for risk modeling 268, 269, 270
Latent Dirichlet Allocation (LDA) 348
layers
about 573
Least Squares Estimation
Leave-one-out (LOO) 22
Leave-One-Out (LOO) 276
Leave-P-out (LPO) 22
Leave-P-Out (LPO) 276
Leptokurtotic (super-Gaussian) 395
LIBSVM
reference link 114
limited sample populations 4, 5, 6
linear models
linear regression confidence intervals
computing, with Statsmodels 257, 258, 259
linear regression, with Python
Locally Linear Embedding (LLE)
logistic regression
used, for risk modeling 268, 269, 270
Long Short-Term Memory (LSTM)
about 592, 593, 594, 595, 596, 597
Gated Recurrent Unit (GRU) model 597
with TensorFlow and Keras, example 601, 602, 603, 604, 605
long-term depression (LTD) 408
loss function
about 46
loss functions
for gradient boosting 481, 482, 483
M
machine learning models
characteristics 26
manifold learning
about 158
Market Basket Analysis
with Apriori Algorithm 240, 241, 242
Markov chain Monte Carlo (MCMC) 301
Markov random field (MRF) 659, 660, 661
Markov random walks
label propagation 152, 153, 154
Maximum A Posteriori (MAP) 347, 348, 349, 350
Maximum Likelihood Estimation (MLE) 347, 348, 350, 409
mean absolute error (MAE) 260, 617
mean squared error 50
mean squared error (MSE) 610
metric multidimensional scaling 159
Metropolis-Hastings algorithm
Mexican Hat 429
mini-batch gradient descent 518
Modified LLE (MLLE) 163
momentum 532
Multilayer Perceptron (MLP) 508, 509
example, with Keras 522, 523, 525, 526
example, with Tensorflow 522, 523, 525, 526
Multilayer Perceptrons (MLP) 30
Multinomial Negative Log-Likelihood Loss 482
N
Natural Language Processing (NLP) 13
NLopt
reference link 114
non-parametric models 3
non-stationary trend models
Numba
reference link 421
O
Occam's razor principle 32, 42
Oja's rule 414
One-vs-All approach 5
optimization algorithms
gradient perturbation 531
Ordinary Least Squares (OLS) 57, 249
Ordinary or Generalized Least Square algorithms 50
P
padding layers 574
parameter estimation
parametric learning process 3
parametric models 3
Pasting approach 441
perceptron 499, 501, 502, 503, 504
example, with scikit-learn 504, 506, 507
Perceptrons 30
Platt scaling 104
Platykurtotic (sub-Gaussian) 395
polynomial regression
examples 277, 279, 281, 282, 283, 284
pooling layers
Principal Component Analysis (PCA) 159, 411, 471
example, with scikit-learn 387, 388
importance evaluation 384, 385, 386
probability density function (p.d.f.) 303
proper Bagging 441
pseudo-probability vectors 595
PyMC3
reference link 319
sampling process, executing 321, 323, 324
used, for sampling 317, 318, 320, 321
PyMC3 API
reference link 318
PyStan
reference link 327
used, for sampling 325, 327, 328, 329, 330
Python
supervised DBN, example 672, 673, 674
unsupervised DBN, example 669, 670, 671, 672
Q
quasi-noiseless scenarios 386
R
radial basis function kernel 135
Radial basis function (RBF) 214
radial basis function (RBF) kernel 390
random forest
and bias-variance trade-off 446, 447, 448, 449
example, with scikit-learn 449, 450, 451, 452, 453
feature importance 453, 454, 455, 456
range scaling 8
Rayleigh-Ritz method 164
real-world evidence (RWE) 271
rectifier activation function 510, 511, 512
recurrent networks
Back-Propagation Through Time (BPTT) 589, 590
recurrent neural networks (RNNs) 587
regression techniques
about 263
isotonic regression 284
lasso regression 268
logistic regression 268
polynomial regression 273
ridge regression 263
using, in TensorFlow and Keras 542, 543
regularization techniques, examples
about 55
ElasticNet 62
L1 or Lasso regularization 57, 58, 59, 60, 61
L2 or Ridge regularization 55, 56, 57
representational capacity 29
residual 249
Restricted Boltzmann Machine (RBM) 662, 663, 664
ridge regression
about 263
example, with scikit-learn 264, 265, 266, 267, 268
RMSProp
in TensorFlow/Keras 535
Rubner-Tavan's network
S
saddle points 48
Sanger's network
SciKit-Fuzzy
reference link 210
used, with example of Fuzzy C-means 210, 211, 212
scikit-learn
gradient tree boosting, example 483, 484, 486
label propagation 141, 142, 143
used, with example of Density-Based Spatial Clustering of Applications with Noise) 223, 224
used, with example of spectral clustering 217, 219, 220
voting classifiers, example 495, 496, 497
scikit-learn classes
MinMaxScaler 10
RobustScaler 10
StandardScaler 10
self-organizing maps (SOMs)
example 432, 433, 434, 435, 436
self-training
example, with Iris dataset 90, 91, 92, 93
summary 93
self-training
about 87
semi-supervised learning
about 65
assumptions 70
scenarios 66
semi-supervised learning, scenarios
inductive learning 69
transductive learning 69
semi-supervised Support Vector Machines (S3VM)
about 110
implementing, in Python 114, 115, 116, 117, 118, 119
summary 119
separable convolution 568, 569
Sequential Least Squares Programming (SLSQP) 118
SHAP
about 490
reference link 490
shattering 30
Silhouette score
Singular Value Decomposition (SVD) 16, 234, 382
smoothness
increasing, with Laplacian Regularization 148, 149, 150, 151, 152
softmax 512
softmax function 5
sparse autoencoders
about 623
sparse coding 59
Spectral Biclustering, with scikit-learn
spectral clustering 213
example, with scikit-learn 217, 219, 220
stacking approach 442
Stagewise Additive Modeling using a Multi-class Exponential (SAMME) 460
state-of-the-art models 586
Statsmodels
used, for computing linear regression confidence intervals 257, 258, 259
stochastic gradient descent (SGD) 50, 516, 517, 518
Stochastic Gradient Descent (SGD)
about 529
with Momentum in TensorFlow/Keras 533, 534
Stratified K-Fold 21
supervised DBN
example, in Python 672, 673, 674
Support Vector Machines (SVMs) 7
Support Vector Machine (SVM) 50, 110, 669
Swish function 512
T
t-distributed stochastic neighbor embedding
Tensorflow
MLP, example 522, 523, 525, 526
URL 522
TensorFlow
deep convolutional autoencoder, example 612, 613, 614, 616, 617, 619, 620
deep convolutional network, example 574, 575, 576, 578, 579
denoising autoencoder, example 621, 622, 623
TensorFlow/Keras
AdaDelta, using 539
AdaGrad, using 538
Adam, using 537
batch normalization, using 554, 555
dropout, using 544, 545, 546, 549, 551
LSTM, using with 601, 602, 603, 604, 605
regularization, using 541, 542, 543
RMSProp, using 535
SGD with Momentum 533
Tikhonov regularization 55
time-series
about 285
smoothing procedure 287, 288, 289
transductive learning 69
Transductive Support Vector Machines (TSVM)
about 120
configuration analyzing 126, 127, 128, 129
implementing, in Python 121, 122, 123, 124, 125, 126
summary 130
transfer learning 605, 606, 607, 608
transpose convolution 570
Truncated Backpropagation Through Time (TBPTT) 590
t-SNE
U
unbiased estimator 32
unsupervised DBN
example, in Python 669, 670, 671, 672
upsampling layers 574
V
valid padding 565
Vapnik-Chervonenkis-capacity (VC-capacity) 30, 31
Vapnik-Chervonenkis theory 30, 31
Vapnik�s principle 69
variance of an estimator
about 37
Cramér-Rao bound 39, 40, 41, 42
variance scaling 520
variational autoencoder (VAE)
example, with TensorFlow 630, 633, 634
VC-dimension 31
Viterbi algorithm
used, for finding hidden state sequence 341, 342, 343, 344
voting classifiers
example, with scikit-learn 495, 496, 497
W
Wasserstein GAN
with TensorFlow, example 652, 654, 655, 656
weak learners 440
Weighted Least Squares (WLS) 251
weight initialization 518, 520, 521
weight shrinkage 55
weight vector stabilization 414
whitening
advantages 15
versus original dataset 16
within-cluster-dispersion (WCD) 225
X
Xavier initialization 521
XGBoost
features, evaluating 490, 491, 493
gradient boosting, example 486, 487, 488, 489, 490
reference link 486
Z
zero-centering 7
z-score 7
3.133.141.6