Index

Note: Page numbers followed by “f” and “t” refer to figures and tables, respectively.

0-9, and Symbols

0 − 1 loss function, 176
0.632 bootstrap, 170
1R (1-rule), 93
discretization, 296
example use, 94t
missing values and numeric data, 94–96
overfitting for, 95
pseudocode, 93f
11-point average recall, 191

A

Accuracy, of association rules, 79, 79, 120
minimum, 79, 122, 124
Accuracy, of classification rules, 102, 115
Activation functions, 270, 424–426, 425t
Acuity parameter, 152
AdaBoost, 487–489
AdaBoost.M1 algorithm, 487
Additive logistic regression, 492–493
Additive regression, 490–493
ADTree algorithm, 501
Adversarial data mining, 524–527
Agglomerative clustering, 142, 147
Aggregation, 438
Akaike Information Criterion (AIC), 346
AlexNet model, 435
All-dimensions (AD) trees, 350–351
generation, 351
illustrated examples, 350f
Alternating decision trees, 495
example, 495f, 496
prediction nodes, 495
splitter nodes, 495
Analysis of variance (ANOVA), 393
Analyze panel, 568, 570–571
Ancestor-of relation, 51
AND, 262
Anomalies, detecting, 318–319
Antecedent, of rule, 75, 75
Applications, 503
automation, 28
challenge of, 503
data stream learning, 509–512
diagnosis, 25–26
fielded, 21–28
incorporating domain knowledge, 512–515
massive datasets, 506–509
text mining, 515–519
Apriori algorithm, 234–235
Area under the curve (AUC), 191–192
Area under the precision-recall curve (AUPRC), 192
ARFF files, 57
attribute specifications in, 58
attribute types in, 58
defined, 57
illustrated, 58f
Arithmetic underflow, 344–345
Aspect model, 378–379
Assignment of key phrases, 516
Association learning, 44
Association rules, 11–12, 79–80
See also Rules
accuracy (confidence), 79, 120
characteristics, 79
computation requirement, 127
converting item sets to, 122
coverage (support), 79, 120
double-consequent, 125–126
examples, 11–12
finding, 120
finding large item sets, 240–241
frequent-pattern tree, 235–239
mining, 120–127
predicting multiple consequences, 79
relationships between, 80
single-consequent, 126
in Weka, 561
Attribute evaluation methods, 562
attribute subset evaluators, 564
single-attribute evaluators, 564
Attribute filters, 563
supervised, 563
unsupervised, 563
Attribute selection, 287, 288–295
See also Data transformations
backward elimination, 292–293
beam search, 293
best-first search, 293
filter method, 289–290
forward selection, 292–293
instance-based learning methods, 291
race search, 294
recursive feature elimination, 290–291
schemata search, 294
scheme-independent, 289–292
scheme-specific, 293–295
searching the attribute space and, 292–293
selective Naïve Bayes, 295
symmetric uncertainty, 291–292
in Weka, 562
Weka evaluation methods for, 562
wrapper method, 289–290
Attribute subset evaluators, 564
Attribute-efficient learners, 135
Attributes, 43, 53–54, 95
ARFF format, 58
Boolean, 55–56
causal relations, 513
combination of, 120
conversions, 94
date, 58
difference, 135–136
discrete, 55–56
evaluating, 94t
highly branching, 110–113
identification code, 95
interval, 55
irrelevant, 289
nominal, 54, 357
normalized, 61
numeric, 54, 210–212
ordinal, 55
ratio, 55
relations between, 83
relation-valued, 58
relevant, 289
semantic relation between, 513
string, 58, 313
string, conversion, 313
types of, 44, 61–62
values of, 53–54
weighting, 246–247
Authorship ascription, 516
AutoClass, 156, 359
Bayesian clustering scheme, 359, 359–360
Autoencoders, 445–449
combining reconstructive and discriminative learning, 449
denoising autoencoders, 448
layerwise training, 448
pretraining deep autoencoders with RBMs, 448
Automation applications, 28
Averaged one-dependence estimator (AODE), 348–349
Average-linkage method, 147–148

B

Background knowledge, 508
Backpropagation, 263, 426–429
checking implementations, 430–431
stochastic, 268–269
Backward elimination, 292–293
Backward pruning, 213
Bagging, 480
algorithm for, 483f
bias-variance decomposition, 482–483
with costs, 483–484
idealized procedure versus, 483
instability neutralization, 482–483
for numeric prediction, 483
as parallel, 508
randomization versus, 485–486
Bagging algorithm, 480, 481–484
Bags, 156–157
class labels, 157
instances, joining, 474
positive, 475–476
positive probability, 476
Balanced iterative reducing and clustering using hierarchies (BIRCH), 160
Balanced Winnow, 134–135
Ball trees, 139
in finding nearest neighbors, 140
illustrated, 139f
nodes, 139–140
splitting method, 140–141
two cluster centers, 145f
Batch learning, 268–269
Batch normalization, 436
Bayes Information Criterion, 159–160
Bayes’ rule, 337, 339, 362–363
Bayesian clustering, 358–359
AutoClass, 359
DensiTree, 359, 360f
hierarchical, 359
Bayesian estimation and prediction, 367–370
probabilistic inference methods, 368–370
Bayesian Latent Dirichlet allocation (LDAb), 379–380
Bayesian multinet, 349
Bayesian networks, 158, 339–352, 339–352, 382–385
AD tree, 350–351, 350f
algorithms, 347–349
conditional independence, 343–344
data structures for fast learning, 349–352
EM algorithm to, 366–367
example illustrations, 341f, 342f
for weather data, 341f, 342f
K2 algorithm, 411
learning, 344–347
making predictions, 340–344
Markov blanket, 347–348
predictions, 340–344
prior distribution over network structures, 346–347
specific algorithms, 347–349
structure learning by conditional independence tests, 349
TAN, 348
BayesNet algorithm, 416
Beam search, 293
Belief propagation, See Probability propagation
Bernoulli process, 165
BestFirst method, 334
Best-first search, 295
Bias, 33–35
language, 33–34
multilayer perceptron, 263
overfitting-avoidance, 35
search, 34–35
Bias-variance decomposition, 482–483
Binary classification problems, 69
Binary events, 337
Bits, 106–107
Block Gibbs sampling, 454
Boltzmann machines, 449–451
Boolean attributes, 55–56
Boolean classes, 78
Boosting, 486–490
AdaBoost, 487–489
algorithm for, 487, 488f
classifiers, 490
in computational learning theory, 489
decision stumps, 490
forward stagewise additive modeling, 491
power of, 489–490
Bootstrap, 169–171
Bootstrap aggregating, See Bagging
Box kernel, 361
“Burn-in” process, 369
“Business understanding” phase, 28–29

C

functioning of, 219
MDL-based adjustment, 220
C5.0, 221
Caffe, 465
Calibration, class probability, 330
discretization-based, 331
logistic regression, 330
PAV-based, 331
Capabilities class, 561
CART system, 210, 283
cost-complexity pruning, 220–221
Categorical and continuous variables, 452–453
Categorical attributes, See Nominal attributes
Category utility, 142, 154–156
calculation, 154
incremental clustering, 150–154, 152–154
Causal relations, 513
CBA technique, 241
CfsSubsetEval method, 334
Chain rule, 327, 343–344
Chain-structured conditional random fields, 410
Circular ordering, 56
CitationKNN algorithm, 478
Class boundaries
non-axis parallel, 251
rectangular, 248–249, 249f
Class labels
bags, 157
reliability, 506
Class noise, 317
Class probability estimation, 321
dataset with two classes, 329, 329f
difficulty, 328–329
overoptimistic, 329
ClassAssigner component, 564–565
ClassAssigner filter, 564–565, 567
Classes, 45
Boolean, 78
membership functions for, 129
rectangular, 248–249, 249f
Classical machine learning techniques, 418
Classification, 44
clustering for, 468–470
cost-sensitive, 182–183, 484
document, 516
k-nearest-neighbor, 85
Naïve Bayes for, 103–104
nearest-neighbor, 85
one-class, 319
pairwise, 323
Classification learning, 44
Classification rules, 11–12, 75–78
See also Rules
accuracy, 224
antecedent of, 75
criteria for choosing tests, 221–222
disjunctive normal form, 78
with exceptions, 80–82
exclusive-or, 76, 77f
global optimization, 226–227
good rule generation, 224–226
missing values, 223–224
multiple, 78
numeric attributes, 224
from partial decision trees, 227–231
producing with covering algorithms, 223
pruning, 224
replicated subtree, 76, 77f
RIPPER rule learner, 227, 228f, 234
ClassifierPerformanceEvaluator, 565, 567
ClassifierSubsetEval method, 334
Classify panel, 558, 559, 559, 563
classification error visualization, 559
Cleansing
artificial data generation, 321–322
detecting anomalies, 318–319
improving decision trees, 316–317
one-class learning, 319–320
outlier detection, 320–321
robust regression, 317–318
“Cliques”, 385
Closed-world assumptions, 47, 78
CLOSET+ algorithm, 241
Clustering, 44, 141–156, 352–363, 473
agglomerative, 142, 147
algorithms, 87–88
category utility, 142
comparing parametric, semiparametric and nonparametric density models, 362–363
with correlated attributes, 359–361
document, 516
EM algorithm, 353–356
evaluation, 200
expectation maximization algorithm, 353–356
extending mixture model, 356–358
for classification, 468–470
group-average, 148
hierarchical, 147–148
in grouping items, 45
incremental, 150–154
iterative distance-based, 142–144
k-means, 144
MDL principle application to, 200–201
number of clusters, 146–147
using prior distributions, 358–359
and probability density estimation, 352–363
representation, 88f
statistical, 296
two-class mixture model, 354f
in Weka, 561
Cobweb algorithm, 142, 160, 561–562
Co-EM, 471
“Collapsed Gibbs sampling”, 380–381
Column separation, 325
Comma-separated value (CSV)
data files, 558
format, 558
Complete-linkage method, 147
Computation graphs and complex network structures, 429–430
Computational learning theory, 489
Computational Network Toolkit (CNTK), 465
Computer-Assisted Passenger Prescreening System (CAPPS), 526
Concept descriptions, 43
Concepts, 44–46
See also Input
defined, 43
“Condensed” representation, 473
Conditional independence, 343–344
Conditional probability models, 392–403
generalized linear models, 400–401
gradient descent and second-order methods, 400
using kernels, 402–403
linear and polynomial regression, 392–393
multiclass logistic regression, 396–400
predictions for ordered classes, 402
using priors on parameters, 393–395
matrix vector formulations of linear and polynomial regression, 394–395
Conditional random fields, 406–410
chain-structured conditional random fields, 410
linear chain conditional random fields, 408–409
from Markov random fields to, 407–408
for text mining, 410
Confidence
of association rules, 79, 120
intervals, 173–174
upper/lower bounds, 246
Confidence limits
in error rate estimation, 215–217
for normal distribution, 166t
for Student’s distribution, 174t
on success probability, 246
Confusion matrix, 181
Consequent, of rule, 75
ConsistencySubsetEval method, 334
Constrained quadratic optimization, 254
Contact lens problem, 12–14
covering algorithm, 115–119
rules, 13f
structural description, 14, 14f
Continuous attributes, See Numeric attributes
Contrastive divergence, 452
Convex hulls, 253
Convolution, 440, 441
Convolutional neural networks (CNNs), 419, 437–438
convolutional layers and gradients, 443–444
deep convolutional networks, 438–439
from image filtering to learnable convolutional layers, 439–443
ImageNet evaluation, 438–439
implementation, 445
pooling and subsampling layers and gradients, 444
Corrected resampled t-test, 175–176
Cost curves, 192–194
cost in, 193
cost matrixes, 182, 182t, 186
Cost of errors, 179–180
cost curves, 192–194
cost-sensitive classification, 182–183
cost-sensitive learning, 183
examples, 180
lift charts, 183–186
problem misidentification, 180
recall-precision curves, 190
ROC curves, 186–190
Cost–benefit analyzer, 186
Cost-complexity pruning, 220–221
Cost-sensitive classification, 182–183, 484
Cost-sensitive learning, 183
two-class, 183
Co-training, 470
EM and, 471
Counting the cost, 179–194
Covariance matrix, 356–357
Coverage, of association rules, 79, 120
minimum, 124
specifying, 127
Covering algorithms, 113–119
example, 115
illustrated, 113f
instance space during operation of, 115f
operation, 115
in producing rules, 223
in two-dimensional space, 113–114
CPU performance, 16
dataset, 16t
Cross-correlation, 440, 441
Cross-validation, 167–168, 432–433
estimates, 173
folds, 168
leave-one-out, 169
repeated, 175–176
for ROC curve generation, 189
stratified threefold, 168
tenfold, 168, 286–287
threefold, 168
CrossValidationFoldMaker, 565, 567
CuDNN, 465–466
Customer support/service applications, 28
Cutoff parameter, 154

D

Data, 38
augmentation, 437
evaluation phase, 29–30
linearly separable, 131–132
noise, 7
overlay, 57
scarcity of, 529
sparse, 60–61
structures for fast learning, 349–352
Data cleansing, 65, 288, 316–322
See also Data transformations
anomaly detection, 318–319
decision tree improvement, 316–317
methods, 288
one-class learning, 319–320
robust regression, 317–318
Data mining, 5, 5, 6, 9, 28–30
adversarial, 524–527
applying, 504–506
as data analysis, 5
ethics and, 35–38
learning machine and, 4–9
life cycle, 29f
scheme comparison, 172–176
ubiquitous, 527–529
Data preparation
See also Input
ARFF files, 57–60
attribute types, 61–62
data gathering in, 56–57
data knowledge and, 65
inaccurate values in, 63–64
missing values in, 62–63
sparse data, 60–61
Data projections, 287, 287, 304–314
partial least-squares regression, 307–309
principal components analysis, 305–307
random, 307
text to attribute vectors, 313–314
time series, 314
Data stream learning, 509–512
algorithm adaptation for, 510, 510
Hoeffding bound, 510
memory usage, 511–512
Naïve Bayes for, 510
tie-breaking strategy, 511
Data transformations, 285
attribute selection, 288–295
data cleansing, 288, 316–322
data projection, 287, 304–314
discretization of numeric attributes, 287, 296–303
input types and, 305
methods for, 287
multiple classes to binary ones, 288–289, 315–316
sampling, 288, 315–316
“Data understanding” phase, 28–29
Data warehousing, 56–57
Data-dependent expectation, 451
DataSet connections, 566–567
Date attributes, 58
Decimation, 438
Decision boundaries, 69
Decision lists, 11
rules versus, 119
Decision stumps, 490
Decision tree induction, 30, 316
complexity, 217–218
top-down, 221
Decision trees, 6, 70–71, 109f
alternating, 495, 495f, 496
C4.5 algorithm and, 219–220
constructing, 105–113
cost-complexity pruning, 220–221
for disjunction, 76f
error rate estimation, 215–217
examples, 14f, 18f
highly branching attributes, 110–113
improving, 316–317
information calculation, 108–110
missing values, 71, 212–213
nodes, 70–71
numeric attributes, 210–212
partial, obtaining rules from, 227–231
pruning, 213–215
with replicated subtree, 77f
rules, 219
in Weka, 558–559, 559
DecisionStump algorithm, 490
DecisionTable algorithm, 334
Dedicated multi-instance methods, 475–476
Deep belief networks, 455–456
Deep Boltzmann machines, 453–454
Deep feedforward networks, 420–431
activation functions, 424–426, 425t
backpropagation, 426–429
checking implementations, 430–431
computation graphs and complex network structures, 429–430
deep layered network architecture, 423–424
feedforward neural network, 424f
losses and regularization, 422–423
MNIST evaluation, 421–422, 421t
Deep layered network architecture, 423–424
Deep learning, 418
autoencoders, 445–449
deep feedforward networks, 420–431
recurrent neural networks, 456–460
software and network implementations, 464–466
stochastic deep networks, 449–456
techniques, 418
three-layer perceptron, 419
training and evaluating deep networks, 431–437
batch normalization, 436
cross-validation, 432–433
data augmentation and synthetic transformations, 437
dropout, 436
early stopping model, 431–432
hyperparameter tuning, 432–433
learning rates and schedules, 434–435
mini-batch-based stochastic gradient descent, 433–434
parameter initialization, 436–437
pseudocode for mini-batch based stochastic gradient descent, 434, 435f
regularization with priors on parameters, 435
unsupervised pretraining, 437
validation, 432–433
Deeplearning4j, 465
Delta, 314
Dendrograms, 87–88, 147
Denoising autoencoders, 448
Denormalization, 50
problems with, 51
DensiTree, 359, 360f
visualization, 359, 360f
Diagnosis applications, 25–26
faults, 25–26
machine language in, 25
performance tests, 26
Difference attributes, 135–136
Dimensionality reduction, PCA for, 377–378
Direct marketing, 27
Directed acyclic graphs, 340
Discrete attributes, 55–56
1R (1-rule), 296
converting to numeric attributes, 303
discretization, 287, 296–303
See also Data transformations
decision tree learners, 296
entropy-based, 298–301
error-based, 301
global, 296
partitioning, 94–95
proportional k-interval, 297–298
supervised, 297
unsupervised, 297
Discrete events, 337
Discretization-based calibration, 330
Discriminative learning, 449
Disjunctive normal form, 78
Distance functions, 135–136
difference attributes, 135–136
generalized, 250
for generalized exemplars, 248–250
missing values, 136
Diverse-density method, 475–476, 476, 476
Divide-and-conquer, 105–113, 289
Document classification, 516
See also Classification
in assignment of key phrases, 516
in authorship ascription, 516
in language identification, 516
as supervised learning, 516
Document clustering, 516
Domain knowledge, 19
Double-consequent rules, 126
Dropout, 436
Dynamic Bayesian network, 405

E

Early stopping, 266, 267–268, 268
model, 431–432
Eigenvalues, 306
Eigenvectors, 306
“Elastic net” approach, 394
EM algorithm, 416
EM for PPCA, 375–376
END algorithm, 334
“Empirical Bayesian” methods, 368
Empirical risk, 422–423
Ensemble learning, 479
additive regression, 490–493
bagging, 481–484
boosting, 486–490
interpretable ensembles, 493–497
multiple models, 480–481
randomization, 484–486
stacking, 497–499
Entity extraction, in text mining, 517
Entropy, 110
Entropy-based discretization, 298–301
error-based discretization versus, 301
illustrated, 299f
with MDL stopping criterion, 301
results, 299f
stopping criteria, 293, 300
Enumerated, 55–56
Enumerating concept space, 32–33
Equal-frequency binning, 297
Equal-interval binning, 297
Error rate, 163
decision tree, 215–217
repeated holdout, 167
success rate and, 215–216
training set, 163
Error-based discretization, 301
Errors
estimation, 172
inaccurate values and, 63–64
mean-absolute, 195
mean-squared, 195
propagation, 266, 267–268, 268
relative-absolute, 195
relative-squared, 195–196
resubstitution, 163
squared, 177
training set, 163
Estimation error, 172
Ethics, 35–38
issues, 35
personal information and, 37–38
reidentification and, 36–37
Euclidean distance, 135
between instances, 149
function, 246–247
Evaluation
clustering, 200–201
as data mining key, 161–162
numeric prediction, 194–197
performance, 162
Examples, 46–53
See also Instances
specific examples
class of, 45
relations, 47–51
structured, 51
types of, 46–53
Exceptions, rules with, 80–82, 231–233
Exclusive-or problem, 77f
Exclusive-OR (XOR), 262
Exemplars, 245
generalizing, 247–248
noisy, pruning, 245–246
reducing number of, 245
Exhaustive error-correcting codes, 326
ExhaustiveSearch method, 496
Expectation, 357
Expectation maximization (EM) algorithm, 353–356, 355, 356, 365–366, 468
and cotraining, 471
maximization step, 469
with Naïve Bayes, 469
to train Bayesian networks, 366–367
Expected gradients, 364–365
for PPCA, 375
Expected log-likelihoods, 364–365
for PPCA, 374
Experimenter, 554, 568–571
See also Weka workbench
advanced setup, 570
Analyze panel, 568–570, 570–571
results analysis, 569–570
Run panel, 568
running experiments, 568–569
Setup panel, 568, 571
simple setup, 570
starting up, 568–570
Expert models, 480
Explorer, 554, 557–564
See also Weka workbench
ARFF format, 560
Associate panel, 561–562
association-rule learning, 234–241
attribute selection, 564
automatic parameter tuning, 171–172
Classify panel, 558, 558
Cluster panel, 561
clustering algorithms, 141–156
CSV data files, 558
decision tree building, 558–559
filters, 560–561, 563
introduction to, 557–564
learning algorithms, 563
loading datasets, 557–558, 558, 560–561
metalearning algorithms, 558
models, 559
Preprocess panel, 559, 560, 560
search methods, 564
Select Attributes panel, 562, 564
Visualize panel, 553, 562
EXtensible Markup Language (XML), 57, 568

F

Factor analysis, 373
Factor graphs, 382–385
Bayesian networks, 382–385
logistic regression model, 382–385
Markov blanket, 383f
False negatives (FN), 180–181, 182, 191t
False positive rate, 180–181
False positives (FP), 180–181, 182, 191t
Familiar system, 528
Feature map, 439–440
Feature selection, 331–333
Feedforward networks, 269, 270
feedforward neural network, 424f
Fielded applications, 21–28
automation, 28
customer service/support, 28
decisions involving judgments, 22–23
diagnosis, 22–23
image screening, 23–24
load forecasting, 24–25
manufacturing processes, 27–28
marketing and sales, 26–27
scientific, 28
web mining, 21–22
File mining, 53
Files
ARFF, 58, 59–60
filtering, 560–561
loading, 560–561
opening, 560
Filter method, 289–290
FilteredClassifier algorithm, 563
FilteredClassifier metalearning scheme, 563
Filtering approaches, 319
Filters, 554, 563
applying, 561
attribute, 562, 563, 564
information on, 561
instance, 563
supervised, 563, 563, 567
unsupervised, 563, 563, 567
in Weka, 559
Finite mixtures, 353
Fisher’s linear discriminant analysis, 311–312
Fixed set, 54, 510
Flat files, 46–47
F-measure, 191, 202–203
Forward pruning, 213
Forward selection, 292–293, 293
Forward stagewise additive modeling, 491
implementation, 492
numeric prediction, 491–492
overfitting and, 491–492
residuals, 491
Forwards-backwards algorithms, 386
FP-growth algorithm, 235, 241
Frequent-pattern trees, 242
building, 235–239
compact structure, 235
data preparation example, 236t
header tables, 237
implementation, 241
structure illustration, 239f
support threshold, 240
Functional dependencies, 513
Functional trees, 71–72
Fundamental rule of probability, See Product rule

G

Gain ratio, 111–112
Gaussian distributions, 373, 394
Gaussian kernel, 361
Gaussian process regression, 272
Generalization
exemplar, 247–248, 251–252
instance-based learning and, 251
stacked, 497–499
Generalization as search, 31–35
bias, 33–35
enumerating the concept space, 32–33
Generalized distance functions, 250
Generalized linear models, 400–401
link functions, mean functions, and distributions, 401t
Generalized Sequential Patterns (GSP), 241
Generalizing exemplars, 247–248
distance functions for, 248–250
nested, 248
Generative models, 371
Gibbs sampling, 368–369, 369
Global optimization, classification rules for, 226–227
Gradient ascent, 476
Gradient clipping, 457–458
Gradient descent, 266, 267–268, 268
illustrated, 265f
and second-order methods, 400
stochastic, 270–272
subgradients, 270–271
Graphical models, 352, 370–391
computing using sum-product and max-product algorithms, 386–391
factor graphs, 382–385
Markov random fields, 385–386
PCA for dimensionality reduction, 377–378
and plate notation, 371
PPCA, 372–376
probabilistic LSA, 378–379
Graphics processing units (GPUs), 392
GraphViewer, 565
Greedy method, for rule pruning, 219
GreedyStepwise method, 334
Group-average clustering, 148
Growing sets, 224

H

Hamming distance, 325
Hausdorff distance, 475, 477
Hidden attributes, 340
Hidden layer, multilayer perceptrons, 263, 266, 267–268, 267f, 268
Hidden Markov models, 404–405
Hidden variable models, 363–367
EM algorithm, 365–366
to train Bayesian networks, 366–367
expected gradients, 364–365
expected log-likelihoods, 364–365
Hidden variables, 355–356, 363
Hierarchical clustering, 147–148, 359
See also Clustering
agglomerative, 147
average-linkage method, 147–148
centroid-linkage method, 147–148
dendrograms, 147
displays, 149f
example, 148–150
example illustration, 153f
group-average, 148
single-linkage algorithm, 147, 150
HierarchicalClusterer algorithm, 160
Highly branching attributes, 110–113
Hinge loss, 271, 271f
Histogram equalization, 297
Hoeffding bound, 510
Hoeffding trees, 510, 510
Hyperparameter
selection, 171–172
tuning, 432–433
Hyperplanes, 252–253
maximum-margin, 253–254
separating classes, 253f
Hyperrectangles, 247–248
boundaries, 247–248
exception, 248
measuring distance to, 250
in multi-instance learning, 477
overlapping, 248
Hyperspheres, 139
HyperText Markup Language (HTML)
delimiters, 519–520
formatting commands, 519

I

IB1 algorithm, 160
IBk algorithm, 284
Id3 algorithm, 160
ID3 decision tree learner, 113
Identification code attributes, 95
example, 111t
Image screening, 23–24
hazard detection system, 23
input, 23–24
problems, 24
ImageNet evaluation, 438–439
ImageNet Large Scale Visual Recognition Challenge (ILSVRC), 438–439
Inaccurate values, 63–64
Incremental clustering, 150–154
acuity parameter, 152–154
category utility, 150, 151
cutoff parameter, 154
example illustrations, 151f, 153f
merging, 151–152
splitting, 152
Incremental learning, 567
Incremental reduced-error pruning, 225, 226f
IncrementalClassifierEvaluator, 567
Independent and identically distributed (i.i.d.), 338
Independent component analysis, 309–310
Inductive logic programming, 84
Information, 37–38, 106–107
calculating, 108–110
extraction, 517–518
gain calculation, 222
measure, 108–110
value, 110
Informational loss function, 178–179
Information-based heuristics, 223
Input, 43
aggregating, 157
ARFF format, 57–60
attribute types, 61–62
attributes, 53–56
concepts, 44–46
data assembly, 56–57
data transformations and, 304
examples, 46–53
flat files, 46–47
forms, 43
inaccurate values, 63–64
instances, 46–53
missing values, 62–63
preparing, 56–65
sparse data, 60–61
tabular format, 127
Input layer, multilayer perceptrons, 263
Instance connections, 566–567
Instance filters, 563
Instance space
in covering algorithm operation, 115f
partitioning methods, 130f
rectangular generalizations in, 86–87
Instance-Based Learner version 3 (IB3), 246
Instance-based learning, 84–85, 135–141
in attribute selection, 291
characteristics, 84–85
distance functions, 135–136
for generalized exemplars, 248–250
explicit knowledge representation and, 251
generalization and, 244–252
generalizing exemplars, 247–248
nearest-neighbor, 136–141
performance, 245–246
pruning noise exemplars, 245–246
reducing number of exemplars, 245
visualizing, 87
weighting attributes, 246–247
Instance-based representation, 84–87
Instances, 43, 46–47
centroid, 142–143
misclassified, 132–133
with missing values, 212–213
multilabeled, 45
order, 59–60
sparse, 61
subset sort order, 212
training, 198
Interpretable ensembles, 493–497
logistic model trees, 496–497
option trees, 494–496
Interval quantities, 55
Iris example, 14–15
data as clustering problem, 46t
dataset, 15t
decision boundary, 69, 70f
decision tree, 72, 73f
hierarchical clusterings, 153f
incremental clustering, 150–154
rules, 15
rules with exceptions, 80–82, 81f, 231–233, 232f
Isotonic regression, 330
Item sets, 120–121
checking, of two consecutive sizes, 126
converting to rules, 122
in efficient rule generation, 124–127
example, 121t
large, finding with association rules, 240–241
minimum coverage, 124
subsets of, 124–125
Items, 120
Iterated conditional modes procedure, 369–370
Iterative distance-based clustering, 142–144

J

J48 algorithm, 558, 565, 567, 568
cross-validation with, 565
Java virtual machine, 508–509
Joint distribution, 367, 452–453
Judgment decisions, 22–23

K

K2 algorithm, 411
K2 learning algorithm, 347
Kappa statistic, 181
KD-trees, 136
building, 137
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.34