in finding nearest-neighbor, 137–138, 137f
for training instances, 137f
updating, 138–139
Keras, 465–466
Kernel density estimation, 361–362
Kernel logistic regression, 261
Kernel perceptron, 260–261
Kernel regression, 403
Kernel ridge regression, 258–259
computational expense, 259
computational simplicity, 259
drawback, 259
Kernels, conditional probability models using, 402–403
Kernel trick, 258
K-means algorithm, 355
K-means clustering, 142–143
iterations, 144
k-means++, 144
seeds, 144
K-nearest-neighbor method, 85
Knowledge, 37
background, 508
metadata, 513
prior domain, 513
Knowledge Flow interface, 554, 555, 564–567
See also Weka workbench
Associations panel, 566
Classifiers folder, 566
Clusters folder, 566
components, 566
components configuration and connection, 566–567
dataSet connections, 566–567
evaluation components, 566
Evaluation folder, 564–565
Filters folder, 566
incremental learning, 567
starting up, 564–565
Knowledge representation, 91
clusters, 87
instance-based, 84–87
linear models, 68–70
rules, 75–84
tables, 68
trees, 70–75
KStar algorithm, 284

L

L2 regularization, 399, 399
Labor negotiations example, 16–18
dataset, 17t
decision trees, 18f
training dataset, 18
LADTree algorithm, 501
Language bias, 33–34
Language identification, 516
Laplace distribution, 393–394, 394
Laplace estimator, 99, 358–359
Large item sets, finding with association rules, 240–241
Lasagne, 465–466
LatentSemanticAnalysis method, 376–377
Latent Dirichlet allocation (LDA), 379–381
Latent semantic analysis (LSA), 376–377
Latent variables, See Hidden variables
LaTeX typesetting system, 571
Lattice-structured models, 408
Law of diminishing returns, 507
Layerwise training, 448
Lazy classifiers, in Weka, 563
Learning
association, 44
batch, 268–269
classification, 44
concept, 54
cost-sensitive, 65–66
data stream, 509–512
deep, See Deep learning
ensemble, 479
incremental, 567
instance-based, 84–85, 135–141, 244–252
locally weighted, 281–283
machine, 7–9
multi-instance, 53, 156–158, 472–476
one-class, 288, 319–320
in performance situations, 21
rote, 84–85
semisupervised, 468–472
statistics versus, 30–31
testing, 8
Learning algorithms, 563
Bayes, 563
functions, 563
lazy, 563
miscellaneous, 563
rules, 563
trees, 563
Learning Bayesian networks, 344–347
Learning paradigms, 508
Learning rate, 267, 268
and schedules, 434–435
Least-squares linear regression, 70, 129
Least Absolute Shrinkage and Selection Operator (LASSO), 394
Leave-one-out cross-validation, 169
Level-0 models, 497–498
Level-1 model, 497–498
LibLINEAR algorithm, 284
LibSVM algorithm, 284
Lift charts, 183–186
data for, 184t
illustrated, 185f
points on, 194
Lift factor, 183–184
Likelihood, 337
Linear chain conditional random fields, 408–409
Linear classification
logistic regression, 129–131
using the perceptron, 131–133
using Winnow, 133–135
Linear discriminant analysis, 310
Linear machines, 159
Linear models, 68–70, 128–135
in binary classification problems, 69
boundary decision, 69
extending, 252–273
generating, 252
illustrated, 69f, 70f
kernel ridge regression, 258–259
linear classification, 129–131
linear regression, 128–129
local, numeric prediction with, 273–284
logistic regression, 129–131
maximum-margin hyperplane, 253–254
in model tree, 280t
multilayer perceptrons, 261–269
nonlinear class boundaries, 254–256
numeric prediction, 128–129
perceptron, 131–133
stochastic gradient descent, 270–272
support vector machine use, 252
support vector regression, 256–258
in two dimensions, 68
Linear regression, 128–129, 392–393
least-squares, 70, 129
locally weighted, 281–283
matrix vector formulations, 394–395
multiple, 491
multiresponse, 129
Linear threshold unit, 159
LinearForwardSelection method, 334
LinearRegression algorithm, 160
LMT algorithm, 160
Load forecasting, 24–25
Loading files, 560–561
Locally weighted linear regression, 281–283
distance-based weighting schemes, 282
in nonlinear function approximation, 282
Logic programs, 84
Logistic model trees, 496–497
Logistic regression, 129–131, 398–399
additive, 492–493
calibration, 330
generalizing, 131
illustrated, 130f
model, 382–385
two-class, 131
LogitBoost algorithm, 492, 493
Log-likelihood, 338
Log-normal distribution, 357–358
Log-odds distribution, 357–358
Long short-term memory (LSTM), 457, 458
Loss functions
0 − 1, 176
informational, 178–179
quadratic, 177–178
LWL algorithm, 284

M

M5P algorithm, 284
M5Rules algorithm, 284
Machine learning, 7–9
applications, 9
in diagnosis applications, 25
expert models, 480
modern, 467
schemes, 209–210, 210
statistics and, 30–31
Manufacturing process applications, 27–28
Market basket analysis, 26–27, 120
Marketing and sales, 26–27
churn, 26
direct marketing, 27
historical analysis, 27
market basket analysis, 26–27
Markov blanket, 347–348
Marginal likelihood, 355, 363
Marginal log-likelihood for PPCA, 374
Marginal probabilities, 387
Markov blanket, 347–348, 348f, 369
Markov chain Monte Carlo methods, 368–369
Markov models, 403–404
Markov networks, 352
Markov random fields, 385–386, 407–408
Massive datasets, 506–509
Massive Online Analysis (MOA), 512
Max-product algorithms, 391
Max-sum algorithm, See Max-product algorithms
Maximization, 357
Maximum-margin hyperplane, 253–254
illustrated, 253f
support vectors, 253–254
Maximum likelihood estimation, 338–339
Maximum posteriori parameter estimation, 339
Mean-absolute errors, 195
Mean-squared errors, 195
Mean function, 401
Memory usage, 511–512
MetaCost algorithm, 484
Metadata, 56, 512
application examples, 512–513
knowledge, 513
relations among attributes, 513
Metalearners, 563
Metalearning algorithms, in Weka, 563
Metric trees, 141
Metropolis–Hastings algorithm, 368–369
MIDD algorithm, 478
MILR algorithm, 478
Mini-batch-based stochastic gradient descent, 433–434
pseudocode for, 434, 435f
Minimum description length (MDL) principle, 179, 197–200
applying to clustering, 200–201
metric, 346
probability theory and, 199
training instances, 198
MIOptimalBall algorithm, 478
MISMO algorithm, 478
Missing values, 62–63
classification rules, 223–224
decision trees, 70–71, 212–213
distance function, 136
instances with, 212
machine learning schemes and, 63
mixture models, 358
Naïve Bayes, 100
partial decision trees, 230–231
reasons for, 63
MISVM algorithm, 478
MIWrapper algorithm, 160
Mixed-attribute problems, 11
Mixed National Institute of Standards and Technology (MNIST), 421–422, 421t
Mixture models, 353, 370
extending, 356–358
finite mixtures, 353
missing values, 357
nominal attributes, 357
two-class, 354f
Mixture of Gaussians
expectation maximization algorithm, 353–356
Mixtures, 353
of factor analyzers, 360–361
of principal component analyzers, 360–361
Model’s expectation, 451
Model trees, 75, 273, 274–275
building, 275
illustrated, 74f
induction pseudocode, 277–281, 278f
linear models in, 280t
logistic, 496–497
with nominal attributes, 279f
pruning, 275–276
rules from, 281
smoothing calculation, 274
Multiclass prediction, 181
MultiClassClassifier algorithm, 334
Multiclass classification problem, 396
Multiclass logistic regression, 396–400
matrix vector formulation, 397–398
priors on parameters, 398–400
Multi-instance learning, 53, 156–158, 472
See also Semisupervised learning
aggregating the input, 157
aggregating the output, 157–158
bags, 156–157, 474
converting to single-instance learning, 472–474
dedicated methods, 475–476
hyperrectangles for, 476
nearest-neighbor learning adaptation to, 475
supervised, 156–157
upgrading learning algorithms, 475
Multi-instance problems, 53
ARFF file, 60f
converting to single-instance problem, 157
Multilabeled instances, 45
Multilayer perceptrons, 261–269
backpropagation, 264–269, 268
bias, 263
datasets corresponding to, 262f
as feed-forward networks, 269
hidden layer, 263, 266, 267, 267f
input layer, 263
units, 263
MultilayerPerceptron algorithm, 284
Multinomial logistic regression, 396
Multinominal Naïve Bayes, 103
Multiple classes to binary transformation, 322–328, 324t
See also Data transformations
error-correcting output codes, 324–326
nested dichotomies, 326–328
one-vs.-rest method, 323
pairwise classification, 323
pairwise coupling, 323
simple methods, 323–324
Multiple linear regression, 491
Multiresponse linear regression, 129
drawbacks, 129
membership function, 129
Multistage decision property, 110

N

Naïve Bayes, 99, 289
classifier, 347
for document classification, 103–104
with EM, 469
independent attributes assumption, 469
locally weighted, 283
missing values, 100–103
multinominal, 103
numeric attributes, 100–103
selective, 295
semantics, 105
NaiveBayes algorithm, 160
NaiveBayesMultinomial algorithm, 160
NaiveBayesUpdateable algorithm, 566–567
NAND, 263
Nearest-neighbor classification, 85
speed, 141
Nearest-neighbor learning, 475
attribute selection, 290
Hausdorff distance variants and, 477
instance-based, 136
multi-instance data adaptation, 475
Nested dichotomies, 326–328
code matrix, 327t
defined, 327
ensemble of, 328
Neural networks, 445
approaches, 471–472
Neuron’s receptive field, 440
N-fold cross-validation, 169
N-grams, 403–404, 516
Nnge algorithm, 284
Noise, 7
“Noisy-OR” function, 476
Nominal attributes, 54
mixture model, 356–358
numeric prediction, 276
symbols, 54
Nonlinear class boundaries, 254–256
Nonparametric density models for classification, 362–363
Normal distribution
assumption, 103, 105
confidence limits, 166t
Normalization, 184, 408
Norm clipping, See Gradient clipping
NOT, 262
Novelty detection, See Outlier—detection of
Nuclear family, 50
Null hypothesis, 59
Numeric attributes, 54, 296–303
1R, 94
classification rules, 224
converting discrete attributes to, 303
decision tree, 210–212
discretization of, 287
Naïve Bayes, 100
normal-distribution assumption for, 105
Numeric prediction, 16, 44
additive regression, 490–493
bagging for, 483
evaluating, 194–197
linear models, 128–135
outcome as numeric value, 46
performance measures, 195t, 197t
support vector machine algorithms for, 256
Numeric prediction (local linear models), 273–284
building trees, 275
locally weighted linear regression, 281–283
model tree induction, 277–281
model trees, 274–275
nominal attributes, 276
pruning trees, 275–276
rules from model trees, 281
Numeric thresholds, 211
Numeric-attribute problems, 11

O

Obfuscate filter, 304–305, 525
Object editors, 553–554
Occam’s Razor, 197, 200, 489–490
One-class classification, See Outlier—detection of
One-class learning, 288, 319–320
multiclass classifiers, 320–321
outlier detection, 320–321
One-dependence estimator, 348–349
“One-hot” method, 393
OneR algorithm, 568
One-tailed probability, 166
One-vs.-rest method, 323
Option trees, 494–496
as alternating decision trees, 495, 495f
decision trees versus, 494
example, 494f
generation, 494–495
OR, 262
Order-independent rules, 119
Ordered classes, predictions for, 402
“Ordered logit” models, 402
Orderings, 54
circular, 56
partial, 56
Ordinal attributes, 55–56
coding of, 55–56
Orthogonal coordinate systems, 305
Outliers, 320
detection of, 320–321
Output
aggregating, 157
clusters, 87–88
instance-based representation, 84–87
knowledge representation, 91
linear models, 68–70
rules, 75–84
tables, 68
trees, 70–75
Overfitting, 95
for 1R, 95
backpropagation and, 268
forward stagewise additive regression and, 491–492
support vectors and, 255
Overfitting-avoidance bias, 35
Overlay data, 57

P

PageRank, 21, 504, 520–522
recomputation, 521
sink, 522
in Web mining, 521
Pair-adjacent violators (PAV) algorithm, 330
Paired t-test, 173
Pairwise classification, 323
Pairwise coupling, 323
Parabolas, 249
Parallelization, 507–508
Parameter initialization, 436–437
Parametric density models for classification, 362–363
Partial decision trees
best leaf, 230
building example, 230f
expansion algorithm, 229f
missing values, 230–231
obtaining rules from, 227–231
Partial least squares regression, 307–309
Partial ordering, 56
Partitioning
for 1R, 95
discretization, 94
instance space, 86f
training set, 213
Partition function, 385
Parzen window density estimation, 361
Perceptron learning rule, 132
illustrated, 132f
updating of weights, 134
Perceptrons, 133
instance presentation to, 133
kernel, 260–261
linear classification using, 131–133
multilayer, 261–269
voted, 261
Performance
classifier, predicting, 165
comparison, 162
error rate and, 163
evaluation, 162
instance-based learning, 246
for numeric prediction, 195t, 197t
predicting, 165
text mining, 515
Personal information use, 37–38
PKIDiscretize filter, 334
“Plate notation”, 370, 371
PLSFilter filter, 334
Poisson distribution, 357–358
Polynomial regression, 392–393
matrix vector formulations, 394–395
Posterior distribution, 337
Posterior predictive distribution, 367–368
Postpruning, 213
subtree raising, 214
subtree replacement, 214
Prediction
with Bayesian networks, 340–344
multiclass, 181
nodes, 495
outcomes, 180–181, 180t
three-class, 181t
two-class, 180t
Prepruning, 213
Pretraining deep autoencoders with RBMs, 448
Principal component analysis (PCA), 305–307, 372
of dataset, 306f
for dimensionality reduction, 377–378
principal components, 306
recursive, 307
Principal components regression, 307
PrincipalComponents filter, 334
Principle of multiple explanations, 200
Prior distribution, 337
clustering using, 358–359
Prior knowledge, 514
Prior probability, 98–99
PRISM rule-learning algorithm, 39, 110, 118–119
Probabilistic inference methods, 368–370
probability propagation, 368
sampling, simulated annealing, and iterated conditional modes, 368–370
variational inference, 370
Probabilistic LSA (pLSA), 376, 378–379
Probabilistic methods, 336
Bayesian estimation and prediction, 367–370
Bayesian networks, 339–352
clustering and probability density estimation, 352–363
conditional probability models, 392–403
factor graphs, 382–385
foundations, 336–339
graphical models, 370–391
hidden variable models, 363–367
maximum likelihood estimation, 338–339
maximum posteriori parameter estimation, 339
sequential and temporal models, 403–410
software packages and implementations, 414–415
Probabilistic principal component analysis (PPCA), 360–361, 372–376
EM for, 375–376
expected gradient for, 375
expected log-likelihood for, 374
inference with, 373–374
marginal log-likelihood for, 374
Probabilities
class, calibrating, 328–331
maximizing, 199
one-tailed, 166
predicting, 176–179
probability density function relationship, 177
with rules, 13
Probability density estimation, 352–363
clustering and, 352–363
comparing parametric, semiparametric and nonparametric density models, 362–363
expectation maximization algorithm, 353–356
extending mixture model, 356–358
Kernel density estimation, 361–362
two-class mixture model, 354f
Probability density functions, 102
Probability estimates, 340
Probability propagation, 368
Probability theory, 336–337
Product rule, 337, 343–344
Programming by demonstration, 528
Projection, See Data projections
Projections
Fisher’s linear discriminant analysis, 311–312
independent component analysis, 309–310
linear discriminant analysis, 310
quadratic discriminant analysis, 310–311
random, 307
“Proportional odds” models, 402
Proportional k-interval discretization, 297–298
Pruning
cost-complexity, 220–221
decision trees, 213–215
example illustration, 216f
incremental reduced-error, 225, 226f
model trees, 275–276
noisy exemplars, 245–246
postpruning, 213
prepruning, 213
reduced-error, 215, 225
rules, 219
subtree lifting, 218
subtree raising, 214
subtree replacement, 213
Pruning sets, 224
Pseudoinverse, 394

Q

Quadratic discriminant analysis, 310–311
Quadratic loss function, 177–178

R

Race search, 294
RaceSearch method, 334
Radial basis function (RBF), 270
kernels, 256
networks, 256
output layer, 270
Random projections, 307
Random subspaces, 485
RandomCommittee algorithm, 501
RandomForest algorithm, 501
Randomization, 484–486
bagging versus, 485–486
rotation forests, 486
RandomSubSpace algorithm, 501
Ranker method, 564
Ratio quantities, 55
RBFNetwork algorithm, 284
RBMs, pretraining deep autoencoders with, 448
Recall-precision curves, 190
area under the precision-recall curve, 192
points on, 194
Reconstructive learning, 449
Rectangular generalizations, 86–87
Rectified linear units (ReLUs), 424–425
Rectify() function, 424–425
Recurrent neural networks, 269, 456–460
deep encoder-decoder recurrent network, 460f
exploding and vanishing gradients, 457–459
recurrent network architectures, 459–460
Recursive feature elimination, 290–291
Reduced-error pruning, 225, 269
incremental, 225, 226f
Reference density, 322
Reference distribution, 321
Regression, 68
additive, 490–493
isotonic, 330
kernel ridge, 258–259
linear, 16, 128–129
locally weighted, 281–283
logistic, 129–131
partial least-squares, 307–309
principal components, 307
robust, 317–318
support vector, 256–258
Regression equations, 75
Linear regression, 16
Linear regression equation, 16
Regression tables, 68
Regression trees, 72, 273–274
illustrated, 74f
Regularization, 273
Reidentification, 36–37
RELAGGS system, 477
Relations, 47–51
ancestor-of, 51
sister-of, 48f, 49t
superrelations, 50
Relation-valued attributes, 59
instances, 61
specification, 59
Relative absolute errors, 196
Relative squared errors, 195–196
RELIEF (Recursive Elimination of Features), 331
Repeated holdout, 167
Replicated subtree problem, 76
decision tree illustration, 77f
Representation learning techniques, 418
Reservoir sampling, 315–316
Residuals, 308
Restricted Boltzmann machines (RBMs), 451–452
Resubstitution errors, 163
RIPPER algorithm, 227, 228f, 234
Ripple-down rules, 234
Robo-soccer, 526
Robust regression, 317–318
ROC curves, 186–190
area under the curve, 191–192
from different learning schemes, 189
generating with cross-validation, 189
jagged, 188–189
points on, 194
sample, 188f
for two learning schemes, 189f
Rotation forests, 486
RotationForest algorithm, 501
Rote learning, 84–85
Row separation, 325
Rule sets
model trees for generating, 281
for noisy data, 222
Rules, 10, 75–84
antecedent of, 75
association, 11–12, 79–80, 234–241
classification, 11–12, 75–78
computer-generated, 19–21
consequent of, 75
constructing, 113–119
decision lists versus, 119
decision tree, 219
efficient generation of, 124–127
with exceptions, 80–82, 231–233
expert-derived, 19–21
expressive, 82–84
inferring, 93–96
from model trees, 281
order-independent, 119
perceptron learning, 132
popularity, 78
PRISM method for constructing, 118–119
probabilities, 13
pruning, 218, 219
ripple-down, 234
trees versus, 114

S

with replacement, 315
reservoir, 315–316
procedure, 366, 368–370
without replacement, 315, 316
“Scaled” kernel function, 361
Schemata search, 294
Scheme-independent attribute selection, 289–292
filter method, 289–290
instance-based learning methods, 291
recursive feature elimination, 290–291
symmetric uncertainty, 291–292
wrapper method, 289–290
Scheme-specific attribute selection, 293–295
accelerating, 294–295
paired t-test, 294
race search, 294
results, 294
schemata search, 294
selective Naïve Bayes, 295
Scientific applications, 28
Screening images, 23–24
Search, generalization as, 31–35
Search bias, 34–35
Search engines, in web mining, 21–22
Search methods (Weka), 413, 564
Second-order analysis, 435
Seeds, 144
Selective Naïve Bayes, 295
Semantic relationship, 513
Semiparametric density models for classification, 362–363
Semisupervised learning, 467, 468–472
See also Multi-instance learning
clustering for classification, 468–470
co-EM, 471
cotraining, 470, 471
EM and, 471
neural network approaches, 471–472
Separate-and-conquer algorithms, 119, 289
Sequential and temporal models, 403–410
conditional random fields, 406–410
hidden Markov models, 404–405
Markov models, 403–404
N-gram methods, 403–404
Set kernel, 475
Shapes problem, 82
illustrated, 82f
training data, 83t
Sigmoid function, 264f
Sigmoid kernel, 256
SimpleCart algorithm, 242
SimpleKMeans algorithm, 160
SimpleLinearRegression algorithm, 160
SimpleMI algorithm, 160
Simple probabilistic modeling, 96–105
Simulated annealing, 369
Single-attribute evaluators, 564
Single-consequent rules, 126
Single-linkage clustering algorithm, 147, 149
Skewed datasets, 139
Sliding dot product, 440
Smoothing calculation, 274
“Sobel” filters, 441
Soft maximum, 475
Softmax function, 397
Soybean classification example, 19–21
dataset, 20t
examples rules, 19
Sparse data, 60–61
Splitter nodes, 495
Splitting, 152
clusters, 146
criterion, 275
model tree nodes, 277–278
Squared error, 178
Stacking, 319, 497–499
defined, 159, 497
level-0 model, 497–498
level-1 model, 497–498
model input, 497–498
output combination, 497
as parallel, 507–508
Standard deviation from the mean, 166
Standard deviation reduction (SDR), 275, 276–277
Standardizing statistical variables, 61
Statistical clustering, 296
Statistical modeling, 406
Statistics, machine learning and, 30–31
Step function, 264f
Stochastic backpropagation, 268–269
Stochastic deep networks, 449–456
See also Convolutional neural networks (CNNs)
Boltzmann machines, 449–451
categorical and continuous variables, 452–453
contrastive divergence, 452
deep belief networks, 455–456
deep Boltzmann machines, 453–454
restricted Boltzmann machines, 451–452
Stochastic gradient descent, 270–272
Stopwords, 313, 516
Stratification, 167
variation reduction, 168
Stratified holdout, 167
Stratified threefold cross-validation, 168
String attributes, 58
specification, 58
values, 59
StringToWordVector filter, 290, 563
Structural descriptions, 6–7
decision trees, 6
learning techniques, 9
Structure learning, 349
by conditional independence tests, 349
“Structured prediction” techniques, 407–408
Student’s distribution with k-1 degrees of freedom, 173–174
Student’s t-test, 173
Subgradients, 270–271
Subsampling, 444
Subtree lifting, 218
Subtree raising, 214
Subtree replacement, 213
Success rate, error rate and, 215–216
Sum rule, 337
Sum-product algorithms, 386–391, 388–389
example, 389–390
marginal probabilities, 387
probable explanation example, 390
Super-parent one-dependence estimator, 348–349
Superrelations, 50
Supervised discretization, 297, 332
Supervised filters, 563, 563
attribute, 563
instance, 563
using, 563
Supervised learning, 45
multi-instance learning, 472–476
Support, of association rules, 79, 120
Support vector machines (SVMs), 252, 403, 471
co-EM with, 471
hinge loss, 271
linear model usage, 252
term usage, 252
training, 253–254
weight update, 272
Support vector regression, 256–258
flatness maximization, 256–257
illustrated, 257f
for linear case, 257
linear regression differences, 256–257
for nonlinear case, 257
Support vectors, 253–254
finding, 254
overfitting and, 255
Survival functions, 402
Symmetric uncertainty, 291–292
Synthetic transformations, 437

T

Tables
as knowledge representation, 68
regression, 68
Tabular input format, 127
Teleportation, 522
Tenfold cross-validation, 169
Tensor flow, 464–465
Tensors, 420, 464–465
Testing, 163–164
test data, 163
test sets, 163
TestSetMaker, 566–567, 567
Text mining, 515–519
conditional random fields for, 410
data mining versus, 515
document classification, 516
entity extraction, 517
information extraction, 517–518
metadata extraction, 517
performance, 515, 515
stopwords, 516
Text summarization, 515
Text to attribute vectors, 313–314
Theano, 464
Theory, 197
exceptions to, 197
MDL principle and, 198
Threefold cross-validation, 168
3-point average recall, 191
“Time-homogeneous” models, 405
Time series, 314
Delta, 314
timestamp attribute, 314
Timestamp attribute, 314
Tokenization, 313
Top-down induction, of decision trees, 221
Torch, 465
Training, 163–164
data, 164
instances, 198
support vector machines, 261
Training sets, 162
error, 215
error rate, 163
partitioning, 213
TrainingSetMaker, 566–567, 567
Tree diagrams, See Dendrograms
Tree-augmented Naïve Bayes (TAN), 348
Trees, 70–75
See also Decision trees
ball, 139, 139f
frequent-pattern, 235–239
functional, 71–72
Hoeffding, 511
kD, 136, 137, 137f, 137f
logistic model, 496–497
metric, 141
model, 74f, 75, 273
option, 494–496
regression, 72, 74f, 273
rules versus, 114
True negatives (TN), 180–181, 190–191
True positive rate, 186–188
True positives (TP), 180–181, 190–191
T-statistic, 174–175, 175, 175–176
T-test, 173
corrected resampled, 175–176
paired, 173
Two-class mixture model, 354f
Two-class problem, 82
Typographic errors, 63–64

U

Ubiquitous computing, 527
Ubiquitous data mining, 527–529
Unbalanced data, 64–65
Unmasking, 526–527
Unsupervised attribute filters, 563, 563
See also Filters
Unsupervised discretization, 297–298
Unsupervised pretraining, 437
User Classifier (Weka), 72

V

Validation, 432–433
Validation data, 164
Validation sets, 508
for model selection, 201–202
Variables, standardizing, 61
Variance, 482
Variational bound, 370
Variational inference, 370
Variational parameters, 370
Venn diagrams, in cluster representation, 87–88
Visualization, in Weka, 562
Visualize panel, 562
Viterbi algorithms, 386
Voted perceptron, 261

W

Weather problem example, 10–12
alternating decision tree, 495f
ARFF file for, 58f
association rules, 11–12, 123t
attribute space, 292f
attributes evaluation, 94t
attributes, 10
Bayesian networks, 341f, 342f
clustering, 151f
counts and probabilities, 97t
data with numeric class, 47t
dataset, 11t
decision tree, 109f
expanded tree stumps, 108f
FP-tree insertion, 236t
identification codes, 111t
item sets, 121t
multi-instance ARFF file, 60f
numeric data with summary statistics, 101t
option tree, 494f
tree stumps, 106f
Web mining, 21–22, 519–522
PageRank algorithm, 520–522
search engines, 22
teleportation, 522
wrapper induction, 519–520
Weight decay, 269, 393, 399, 435
Weighting attributes
instance-based learning, 246–247
test, 247
updating, 246–247, 247
Weights
determination process, 16
with rules, 13
Weka workbench, 504, 553–555, 553
advanced setup, 570
association rules, 561–562
attribute selection, 562
clustering, 561–562
components configuration and connection, 566–567
development of, 553
evaluation components, 566
Experimenter, 554, 568–571
Explorer, 554, 557–564
filters, 554, 563
GUI Chooser panel, 556
how to use, 554–555
incremental learning, 567
interfaces, 554
ISO-8601 date/time format, 59
J48 algorithm, 558–559
Knowledge Flow, 554, 564–567
learning algorithms, 563
metalearning algorithms, 563
User Classifier facility, 72
visualization, 562
visualization components, 565, 567
Winnow, 133–135
Balanced, 134–135
linear classification with, 133–135
updating of weights, 134
versions illustration, 134f
Wisdom, 38
Wrapper induction, 519–520
Wrapper method, 289–290
Wrappers, 519

X

XML (eXtensible Markup Language), 57, 568
XOR (exclusive-OR), 262, 263
XRFF format, 57

Z

Zero-frequency problem, 178–179
ZeroR algorithm, 568, 569, 569
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.153