Data Mining

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

in finding nearest-neighbor, 137–138, 137f

for training instances, 137f

updating, 138–139

Keras, 465–466

Kernel density estimation, 361–362

Kernel logistic regression, 261

Kernel perceptron, 260–261

Kernel regression, 403

Kernel ridge regression, 258–259

computational expense, 259

computational simplicity, 259

drawback, 259

Kernels, conditional probability models using, 402–403

Kernel trick, 258

K-means algorithm, 355

K-means clustering, 142–143

iterations, 144

k-means++, 144

seeds, 144

K-nearest-neighbor method, 85

Knowledge, 37

background, 508

metadata, 513

prior domain, 513

Knowledge Flow interface, 554, 555, 564–567
See also Weka workbench

Associations panel, 566

Classifiers folder, 566

Clusters folder, 566

components, 566

components configuration and connection, 566–567

dataSet connections, 566–567

evaluation components, 566

Evaluation folder, 564–565

Filters folder, 566

incremental learning, 567

starting up, 564–565

Knowledge representation, 91

clusters, 87

instance-based, 84–87

linear models, 68–70

rules, 75–84

tables, 68

trees, 70–75

KStar algorithm, 284

L₂ regularization, 399, 399

Labor negotiations example, 16–18

dataset, 17t

decision trees, 18f

training dataset, 18

LADTree algorithm, 501

Language bias, 33–34

Language identification, 516

Laplace distribution, 393–394, 394

Laplace estimator, 99, 358–359

Large item sets, finding with association rules, 240–241

Lasagne, 465–466

LatentSemanticAnalysis method, 376–377

Latent Dirichlet allocation (LDA), 379–381

Latent semantic analysis (LSA), 376–377

Latent variables, See Hidden variables

LaTeX typesetting system, 571

Lattice-structured models, 408

Law of diminishing returns, 507

Layerwise training, 448

Lazy classifiers, in Weka, 563

Learning

association, 44

batch, 268–269

classification, 44

concept, 54

cost-sensitive, 65–66

data stream, 509–512

deep, See Deep learning

ensemble, 479

incremental, 567

instance-based, 84–85, 135–141, 244–252

locally weighted, 281–283

machine, 7–9

multi-instance, 53, 156–158, 472–476

one-class, 288, 319–320

in performance situations, 21

rote, 84–85

semisupervised, 468–472

statistics versus, 30–31

testing, 8

Learning algorithms, 563

Bayes, 563

functions, 563

lazy, 563

miscellaneous, 563

rules, 563

trees, 563

Learning Bayesian networks, 344–347

Learning paradigms, 508

Learning rate, 267, 268

and schedules, 434–435

Least-squares linear regression, 70, 129

Least Absolute Shrinkage and Selection Operator (LASSO), 394

Leave-one-out cross-validation, 169

Level-0 models, 497–498

Level-1 model, 497–498

LibLINEAR algorithm, 284

LibSVM algorithm, 284

Lift charts, 183–186

data for, 184t

illustrated, 185f

points on, 194

Lift factor, 183–184

Likelihood, 337

Linear chain conditional random fields, 408–409

Linear classification

logistic regression, 129–131

using the perceptron, 131–133

using Winnow, 133–135

Linear discriminant analysis, 310

Linear machines, 159

Linear models, 68–70, 128–135

in binary classification problems, 69

boundary decision, 69

extending, 252–273

generating, 252

illustrated, 69f, 70f

kernel ridge regression, 258–259

linear classification, 129–131

linear regression, 128–129

local, numeric prediction with, 273–284

logistic regression, 129–131

maximum-margin hyperplane, 253–254

in model tree, 280t

multilayer perceptrons, 261–269

nonlinear class boundaries, 254–256

numeric prediction, 128–129

perceptron, 131–133

stochastic gradient descent, 270–272

support vector machine use, 252

support vector regression, 256–258

in two dimensions, 68

Linear regression, 128–129, 392–393

least-squares, 70, 129

locally weighted, 281–283

matrix vector formulations, 394–395

multiple, 491

multiresponse, 129

Linear threshold unit, 159

LinearForwardSelection method, 334

LinearRegression algorithm, 160

LMT algorithm, 160

Load forecasting, 24–25

Loading files, 560–561

Locally weighted linear regression, 281–283

distance-based weighting schemes, 282

in nonlinear function approximation, 282

Logic programs, 84

Logistic model trees, 496–497

Logistic regression, 129–131, 398–399

additive, 492–493

calibration, 330

generalizing, 131

illustrated, 130f

model, 382–385

two-class, 131

LogitBoost algorithm, 492, 493

Log-likelihood, 338

Log-normal distribution, 357–358

Log-odds distribution, 357–358

Long short-term memory (LSTM), 457, 458

Loss functions

0 − 1, 176

informational, 178–179

quadratic, 177–178

LWL algorithm, 284

M5P algorithm, 284

M5Rules algorithm, 284

Machine learning, 7–9

applications, 9

in diagnosis applications, 25

expert models, 480

modern, 467

schemes, 209–210, 210

statistics and, 30–31

Manufacturing process applications, 27–28

Market basket analysis, 26–27, 120

Marketing and sales, 26–27

churn, 26

direct marketing, 27

historical analysis, 27

market basket analysis, 26–27

Markov blanket, 347–348

Marginal likelihood, 355, 363

Marginal log-likelihood for PPCA, 374

Marginal probabilities, 387

Markov blanket, 347–348, 348f, 369

Markov chain Monte Carlo methods, 368–369

Markov models, 403–404

Markov networks, 352

Markov random fields, 385–386, 407–408

Massive datasets, 506–509

Massive Online Analysis (MOA), 512

Max-product algorithms, 391

Max-sum algorithm, See Max-product algorithms

Maximization, 357

Maximum-margin hyperplane, 253–254

illustrated, 253f

support vectors, 253–254

Maximum likelihood estimation, 338–339

Maximum posteriori parameter estimation, 339

MDL, See Minimum description length (MDL) principle

Mean-absolute errors, 195

Mean-squared errors, 195

Mean function, 401

Memory usage, 511–512

MetaCost algorithm, 484

Metadata, 56, 512

application examples, 512–513

knowledge, 513

relations among attributes, 513

Metalearners, 563

Metalearning algorithms, in Weka, 563

Metric trees, 141

Metropolis–Hastings algorithm, 368–369

MIDD algorithm, 478

MILR algorithm, 478

Mini-batch-based stochastic gradient descent, 433–434

pseudocode for, 434, 435f

Minimum description length (MDL) principle, 179, 197–200

applying to clustering, 200–201

metric, 346

probability theory and, 199

training instances, 198

MIOptimalBall algorithm, 478

MISMO algorithm, 478

Missing values, 62–63

1R, 94–96

classification rules, 223–224

decision trees, 70–71, 212–213

distance function, 136

instances with, 212

machine learning schemes and, 63

mixture models, 358

Naïve Bayes, 100

partial decision trees, 230–231

reasons for, 63

MISVM algorithm, 478

MIWrapper algorithm, 160

Mixed-attribute problems, 11

Mixed National Institute of Standards and Technology (MNIST), 421–422, 421t

Mixture models, 353, 370

extending, 356–358

finite mixtures, 353

missing values, 357

nominal attributes, 357

two-class, 354f

Mixture of Gaussians

expectation maximization algorithm, 353–356

Mixtures, 353

of factor analyzers, 360–361

of principal component analyzers, 360–361

MOA, See Massive Online Analysis (MOA)

Model’s expectation, 451

Model trees, 75, 273, 274–275

building, 275

illustrated, 74f

induction pseudocode, 277–281, 278f

linear models in, 280t

logistic, 496–497

with nominal attributes, 279f

pruning, 275–276

rules from, 281

smoothing calculation, 274

Multiclass prediction, 181

MultiClassClassifier algorithm, 334

Multiclass classification problem, 396

Multiclass logistic regression, 396–400

matrix vector formulation, 397–398

priors on parameters, 398–400

Multi-instance learning, 53, 156–158, 472
See also Semisupervised learning

aggregating the input, 157

aggregating the output, 157–158

bags, 156–157, 474

converting to single-instance learning, 472–474

dedicated methods, 475–476

hyperrectangles for, 476

nearest-neighbor learning adaptation to, 475

supervised, 156–157

upgrading learning algorithms, 475

Multi-instance problems, 53

ARFF file, 60f

converting to single-instance problem, 157

Multilabeled instances, 45

Multilayer perceptrons, 261–269

backpropagation, 264–269, 268

bias, 263

datasets corresponding to, 262f

as feed-forward networks, 269

hidden layer, 263, 266, 267, 267f

input layer, 263

units, 263

MultilayerPerceptron algorithm, 284

Multinomial logistic regression, 396

Multinominal Naïve Bayes, 103

Multiple classes to binary transformation, 322–328, 324t
See also Data transformations

error-correcting output codes, 324–326

nested dichotomies, 326–328

one-vs.-rest method, 323

pairwise classification, 323

pairwise coupling, 323

simple methods, 323–324

Multiple linear regression, 491

Multiresponse linear regression, 129

drawbacks, 129

membership function, 129

Multistage decision property, 110

Naïve Bayes, 99, 289

classifier, 347

for document classification, 103–104

with EM, 469

independent attributes assumption, 469

locally weighted, 283

missing values, 100–103

multinominal, 103

numeric attributes, 100–103

selective, 295

semantics, 105

NaiveBayes algorithm, 160

NaiveBayesMultinomial algorithm, 160

NaiveBayesUpdateable algorithm, 566–567

NAND, 263

Nearest-neighbor classification, 85

speed, 141

Nearest-neighbor learning, 475

attribute selection, 290

Hausdorff distance variants and, 477

instance-based, 136

multi-instance data adaptation, 475

Nested dichotomies, 326–328

code matrix, 327t

defined, 327

ensemble of, 328

Neural networks, 445

approaches, 471–472

Neuron’s receptive field, 440

N-fold cross-validation, 169

N-grams, 403–404, 516

Nnge algorithm, 284

Noise, 7

“Noisy-OR” function, 476

Nominal attributes, 54

mixture model, 356–358

numeric prediction, 276

symbols, 54

Nonlinear class boundaries, 254–256

Nonparametric density models for classification, 362–363

Normal distribution

assumption, 103, 105

confidence limits, 166t

Normalization, 184, 408

Norm clipping, See Gradient clipping

NOT, 262

Novelty detection, See Outlier—detection of

Nuclear family, 50

Null hypothesis, 59

Numeric attributes, 54, 296–303

1R, 94

classification rules, 224

converting discrete attributes to, 303

decision tree, 210–212

discretization of, 287

Naïve Bayes, 100

normal-distribution assumption for, 105

Numeric prediction, 16, 44

additive regression, 490–493

bagging for, 483

evaluating, 194–197

linear models, 128–135

outcome as numeric value, 46

performance measures, 195t, 197t

support vector machine algorithms for, 256

Numeric prediction (local linear models), 273–284

building trees, 275

locally weighted linear regression, 281–283

model tree induction, 277–281

model trees, 274–275

nominal attributes, 276

pruning trees, 275–276

rules from model trees, 281

Numeric thresholds, 211

Numeric-attribute problems, 11

Obfuscate filter, 304–305, 525

Object editors, 553–554

Occam’s Razor, 197, 200, 489–490

One-class classification, See Outlier—detection of

One-class learning, 288, 319–320

multiclass classifiers, 320–321

outlier detection, 320–321

One-dependence estimator, 348–349

“One-hot” method, 393

OneR algorithm, 568

One-tailed probability, 166

One-vs.-rest method, 323

Option trees, 494–496

as alternating decision trees, 495, 495f

decision trees versus, 494

example, 494f

generation, 494–495

OR, 262

Order-independent rules, 119

Ordered classes, predictions for, 402

“Ordered logit” models, 402

Orderings, 54

circular, 56

partial, 56

Ordinal attributes, 55–56

coding of, 55–56

Orthogonal coordinate systems, 305

Outliers, 320

detection of, 320–321

Output

aggregating, 157

clusters, 87–88

instance-based representation, 84–87

knowledge representation, 91

linear models, 68–70

rules, 75–84

tables, 68

trees, 70–75

Overfitting, 95

for 1R, 95

backpropagation and, 268

forward stagewise additive regression and, 491–492

support vectors and, 255

Overfitting-avoidance bias, 35

Overlay data, 57

PageRank, 21, 504, 520–522

recomputation, 521

sink, 522

in Web mining, 521

Pair-adjacent violators (PAV) algorithm, 330

Paired t-test, 173

Pairwise classification, 323

Pairwise coupling, 323

Parabolas, 249

Parallelization, 507–508

Parameter initialization, 436–437

Parametric density models for classification, 362–363

Partial decision trees

best leaf, 230

building example, 230f

expansion algorithm, 229f

missing values, 230–231

obtaining rules from, 227–231

Partial least squares regression, 307–309

Partial ordering, 56

Partitioning

for 1R, 95

discretization, 94

instance space, 86f

training set, 213

Partition function, 385

Parzen window density estimation, 361

PAV, See Pair-adjacent violators (PAV) algorithm

Perceptron learning rule, 132

illustrated, 132f

updating of weights, 134

Perceptrons, 133

instance presentation to, 133

kernel, 260–261

linear classification using, 131–133

multilayer, 261–269

voted, 261

Performance

classifier, predicting, 165

comparison, 162

error rate and, 163

evaluation, 162

instance-based learning, 246

for numeric prediction, 195t, 197t

predicting, 165

text mining, 515

Personal information use, 37–38

PKIDiscretize filter, 334

“Plate notation”, 370, 371

PLSFilter filter, 334

Poisson distribution, 357–358

Polynomial regression, 392–393

matrix vector formulations, 394–395

Posterior distribution, 337

Posterior predictive distribution, 367–368

Postpruning, 213

subtree raising, 214

subtree replacement, 214

Prediction

with Bayesian networks, 340–344

multiclass, 181

nodes, 495

outcomes, 180–181, 180t

three-class, 181t

two-class, 180t

Prepruning, 213

Pretraining deep autoencoders with RBMs, 448

Principal component analysis (PCA), 305–307, 372

of dataset, 306f

for dimensionality reduction, 377–378

principal components, 306

recursive, 307

Principal components regression, 307

PrincipalComponents filter, 334

Principle of multiple explanations, 200

Prior distribution, 337

clustering using, 358–359

Prior knowledge, 514

Prior probability, 98–99

PRISM rule-learning algorithm, 39, 110, 118–119

Probabilistic inference methods, 368–370

probability propagation, 368

sampling, simulated annealing, and iterated conditional modes, 368–370

variational inference, 370

Probabilistic LSA (pLSA), 376, 378–379

Probabilistic methods, 336

Bayesian estimation and prediction, 367–370

Bayesian networks, 339–352

clustering and probability density estimation, 352–363

conditional probability models, 392–403

factor graphs, 382–385

foundations, 336–339

graphical models, 370–391

hidden variable models, 363–367

maximum likelihood estimation, 338–339

maximum posteriori parameter estimation, 339

sequential and temporal models, 403–410

software packages and implementations, 414–415

Probabilistic principal component analysis (PPCA), 360–361, 372–376

EM for, 375–376

expected gradient for, 375

expected log-likelihood for, 374

inference with, 373–374

marginal log-likelihood for, 374

Probabilities

class, calibrating, 328–331

maximizing, 199

one-tailed, 166

predicting, 176–179

probability density function relationship, 177

with rules, 13

Probability density estimation, 352–363

clustering and, 352–363

comparing parametric, semiparametric and nonparametric density models, 362–363

expectation maximization algorithm, 353–356

extending mixture model, 356–358

Kernel density estimation, 361–362

two-class mixture model, 354f

Probability density functions, 102

Probability estimates, 340

Probability propagation, 368

Probability theory, 336–337

Product rule, 337, 343–344

Programming by demonstration, 528

Projection, See Data projections

Projections

Fisher’s linear discriminant analysis, 311–312

independent component analysis, 309–310

linear discriminant analysis, 310

quadratic discriminant analysis, 310–311

random, 307

“Proportional odds” models, 402

Proportional k-interval discretization, 297–298

Pruning

cost-complexity, 220–221

decision trees, 213–215

example illustration, 216f

incremental reduced-error, 225, 226f

model trees, 275–276

noisy exemplars, 245–246

postpruning, 213

prepruning, 213

reduced-error, 215, 225

rules, 219

subtree lifting, 218

subtree raising, 214

subtree replacement, 213

Pruning sets, 224

Pseudoinverse, 394

Quadratic discriminant analysis, 310–311

Quadratic loss function, 177–178

Race search, 294

RaceSearch method, 334

Radial basis function (RBF), 270

kernels, 256

networks, 256

output layer, 270

Random projections, 307

Random subspaces, 485

RandomCommittee algorithm, 501

RandomForest algorithm, 501

Randomization, 484–486

bagging versus, 485–486

rotation forests, 486

RandomSubSpace algorithm, 501

Ranker method, 564

Ratio quantities, 55

RBF, See Radial basis function (RBF)

RBFNetwork algorithm, 284

RBMs, pretraining deep autoencoders with, 448

Recall-precision curves, 190

area under the precision-recall curve, 192

points on, 194

Reconstructive learning, 449

Rectangular generalizations, 86–87

Rectified linear units (ReLUs), 424–425

Rectify() function, 424–425

Recurrent neural networks, 269, 456–460

deep encoder-decoder recurrent network, 460f

exploding and vanishing gradients, 457–459

recurrent network architectures, 459–460

Recursive feature elimination, 290–291

Reduced-error pruning, 225, 269

incremental, 225, 226f

Reference density, 322

Reference distribution, 321

Regression, 68

additive, 490–493

isotonic, 330

kernel ridge, 258–259

linear, 16, 128–129

locally weighted, 281–283

logistic, 129–131

partial least-squares, 307–309

principal components, 307

robust, 317–318

support vector, 256–258

Regression equations, 75

Linear regression, 16

Linear regression equation, 16

Regression tables, 68

Regression trees, 72, 273–274

illustrated, 74f

Regularization, 273

Reidentification, 36–37

RELAGGS system, 477

Relations, 47–51

ancestor-of, 51

sister-of, 48f, 49t

superrelations, 50

Relation-valued attributes, 59

instances, 61

specification, 59

Relative absolute errors, 196

Relative squared errors, 195–196

RELIEF (Recursive Elimination of Features), 331

Repeated holdout, 167

Replicated subtree problem, 76

decision tree illustration, 77f

Representation learning techniques, 418

Reservoir sampling, 315–316

Residuals, 308

Restricted Boltzmann machines (RBMs), 451–452

Resubstitution errors, 163

RIPPER algorithm, 227, 228f, 234

Ripple-down rules, 234

Robo-soccer, 526

Robust regression, 317–318

ROC curves, 186–190

area under the curve, 191–192

from different learning schemes, 189

generating with cross-validation, 189

jagged, 188–189

points on, 194

sample, 188f

for two learning schemes, 189f

Rotation forests, 486

RotationForest algorithm, 501

Rote learning, 84–85

Row separation, 325

Rule sets

model trees for generating, 281

for noisy data, 222

Rules, 10, 75–84

antecedent of, 75

association, 11–12, 79–80, 234–241

classification, 11–12, 75–78

computer-generated, 19–21

consequent of, 75

constructing, 113–119

decision lists versus, 119

decision tree, 219

efficient generation of, 124–127

with exceptions, 80–82, 231–233

expert-derived, 19–21

expressive, 82–84

inferring, 93–96

from model trees, 281

order-independent, 119

perceptron learning, 132

popularity, 78

PRISM method for constructing, 118–119

probabilities, 13

pruning, 218, 219

ripple-down, 234

trees versus, 114

Sampling, 288, 315–316
See also Data transformations

with replacement, 315

reservoir, 315–316

procedure, 366, 368–370

without replacement, 315, 316

“Scaled” kernel function, 361

Schemata search, 294

Scheme-independent attribute selection, 289–292

filter method, 289–290

instance-based learning methods, 291

recursive feature elimination, 290–291

symmetric uncertainty, 291–292

wrapper method, 289–290

Scheme-specific attribute selection, 293–295

accelerating, 294–295

paired t-test, 294

race search, 294

results, 294

schemata search, 294

selective Naïve Bayes, 295

Scientific applications, 28

Screening images, 23–24

SDR, See Standard deviation reduction (SDR)

Search, generalization as, 31–35

Search bias, 34–35

Search engines, in web mining, 21–22

Search methods (Weka), 413, 564

Second-order analysis, 435

Seeds, 144

Selective Naïve Bayes, 295

Semantic relationship, 513

Semiparametric density models for classification, 362–363

Semisupervised learning, 467, 468–472
See also Multi-instance learning

clustering for classification, 468–470

co-EM, 471

cotraining, 470, 471

EM and, 471

neural network approaches, 471–472

Separate-and-conquer algorithms, 119, 289

Sequential and temporal models, 403–410

conditional random fields, 406–410

hidden Markov models, 404–405

Markov models, 403–404

N-gram methods, 403–404

Set kernel, 475

Shapes problem, 82

illustrated, 82f

training data, 83t

Sigmoid function, 264f

Sigmoid kernel, 256

SimpleCart algorithm, 242

SimpleKMeans algorithm, 160

SimpleLinearRegression algorithm, 160

SimpleMI algorithm, 160

Simple probabilistic modeling, 96–105

Simulated annealing, 369

Single-attribute evaluators, 564

Single-consequent rules, 126

Single-linkage clustering algorithm, 147, 149

Skewed datasets, 139

Sliding dot product, 440

Smoothing calculation, 274

“Sobel” filters, 441

Soft maximum, 475

Softmax function, 397

Soybean classification example, 19–21

dataset, 20t

examples rules, 19

Sparse data, 60–61

Splitter nodes, 495

Splitting, 152

clusters, 146

criterion, 275

model tree nodes, 277–278

Squared error, 178

Stacking, 319, 497–499

defined, 159, 497

level-0 model, 497–498

level-1 model, 497–498

model input, 497–498

output combination, 497

as parallel, 507–508

Standard deviation from the mean, 166

Standard deviation reduction (SDR), 275, 276–277

Standardizing statistical variables, 61

Statistical clustering, 296

Statistical modeling, 406

Statistics, machine learning and, 30–31

Step function, 264f

Stochastic backpropagation, 268–269

Stochastic deep networks, 449–456
See also Convolutional neural networks (CNNs)

Boltzmann machines, 449–451

categorical and continuous variables, 452–453

contrastive divergence, 452

deep belief networks, 455–456

deep Boltzmann machines, 453–454

restricted Boltzmann machines, 451–452

Stochastic gradient descent, 270–272

Stopwords, 313, 516

Stratification, 167

variation reduction, 168

Stratified holdout, 167

Stratified threefold cross-validation, 168

String attributes, 58

specification, 58

values, 59

StringToWordVector filter, 290, 563

Structural descriptions, 6–7

decision trees, 6

learning techniques, 9

Structure learning, 349

by conditional independence tests, 349

“Structured prediction” techniques, 407–408

Student’s distribution with k-1 degrees of freedom, 173–174

Student’s t-test, 173

Subgradients, 270–271

Subsampling, 444

Subtree lifting, 218

Subtree raising, 214

Subtree replacement, 213

Success rate, error rate and, 215–216

Sum rule, 337

Sum-product algorithms, 386–391, 388–389

example, 389–390

marginal probabilities, 387

probable explanation example, 390

Super-parent one-dependence estimator, 348–349

Superrelations, 50

Supervised discretization, 297, 332

Supervised filters, 563, 563

attribute, 563

instance, 563

using, 563

Supervised learning, 45

multi-instance learning, 472–476

Support, of association rules, 79, 120

Support vector machines (SVMs), 252, 403, 471

co-EM with, 471

hinge loss, 271

linear model usage, 252

term usage, 252

training, 253–254

weight update, 272

Support vector regression, 256–258

flatness maximization, 256–257

illustrated, 257f

for linear case, 257

linear regression differences, 256–257

for nonlinear case, 257

Support vectors, 253–254

finding, 254

overfitting and, 255

Survival functions, 402

Symmetric uncertainty, 291–292

Synthetic transformations, 437

Tables

as knowledge representation, 68

regression, 68

Tabular input format, 127

TAN, See Tree-augmented Naïve Bayes (TAN)

Teleportation, 522

Tenfold cross-validation, 169

Tensor flow, 464–465

Tensors, 420, 464–465

Testing, 163–164

test data, 163

test sets, 163

TestSetMaker, 566–567, 567

Text mining, 515–519

conditional random fields for, 410

data mining versus, 515

document classification, 516

entity extraction, 517

information extraction, 517–518

metadata extraction, 517

performance, 515, 515

stopwords, 516

Text summarization, 515

Text to attribute vectors, 313–314

Theano, 464

Theory, 197

exceptions to, 197

MDL principle and, 198

Threefold cross-validation, 168

3-point average recall, 191

“Time-homogeneous” models, 405

Time series, 314

Delta, 314

timestamp attribute, 314

Timestamp attribute, 314

Tokenization, 313

Top-down induction, of decision trees, 221

Torch, 465

Training, 163–164

data, 164

instances, 198

support vector machines, 261

Training sets, 162

error, 215

error rate, 163

partitioning, 213

TrainingSetMaker, 566–567, 567

Tree diagrams, See Dendrograms

Tree-augmented Naïve Bayes (TAN), 348

Trees, 70–75
See also Decision trees

AD, 350–351, 350f, 351

ball, 139, 139f

frequent-pattern, 235–239

functional, 71–72

Hoeffding, 511

kD, 136, 137, 137f, 137f

logistic model, 496–497

metric, 141

model, 74f, 75, 273

option, 494–496

regression, 72, 74f, 273

rules versus, 114

True negatives (TN), 180–181, 190–191

True positive rate, 186–188

True positives (TP), 180–181, 190–191

T-statistic, 174–175, 175, 175–176

T-test, 173

corrected resampled, 175–176

paired, 173

Two-class mixture model, 354f

Two-class problem, 82

Typographic errors, 63–64

Ubiquitous computing, 527

Ubiquitous data mining, 527–529

Unbalanced data, 64–65

Unmasking, 526–527

Unsupervised attribute filters, 563, 563
See also Filters

Unsupervised discretization, 297–298

Unsupervised pretraining, 437

User Classifier (Weka), 72

Validation, 432–433

Validation data, 164

Validation sets, 508

for model selection, 201–202

Variables, standardizing, 61

Variance, 482

Variational bound, 370

Variational inference, 370

Variational parameters, 370

Venn diagrams, in cluster representation, 87–88

Visualization, in Weka, 562

Visualize panel, 562

Viterbi algorithms, 386

Voted perceptron, 261

Weather problem example, 10–12

alternating decision tree, 495f

ARFF file for, 58f

association rules, 11–12, 123t

attribute space, 292f

attributes evaluation, 94t

attributes, 10

Bayesian networks, 341f, 342f

clustering, 151f

counts and probabilities, 97t

data with numeric class, 47t

dataset, 11t

decision tree, 109f

expanded tree stumps, 108f

FP-tree insertion, 236t

identification codes, 111t

item sets, 121t

multi-instance ARFF file, 60f

numeric data with summary statistics, 101t

option tree, 494f

tree stumps, 106f

Web mining, 21–22, 519–522

PageRank algorithm, 520–522

search engines, 22

teleportation, 522

wrapper induction, 519–520

Weight decay, 269, 393, 399, 435

Weighting attributes

instance-based learning, 246–247

test, 247

updating, 246–247, 247

Weights

determination process, 16

with rules, 13

Weka workbench, 504, 553–555, 553

advanced setup, 570

association rules, 561–562

attribute selection, 562

clustering, 561–562

components configuration and connection, 566–567

development of, 553

evaluation components, 566

Experimenter, 554, 568–571

Explorer, 554, 557–564

filters, 554, 563

GUI Chooser panel, 556

how to use, 554–555

incremental learning, 567

interfaces, 554

ISO-8601 date/time format, 59

J48 algorithm, 558–559

Knowledge Flow, 554, 564–567

learning algorithms, 563

metalearning algorithms, 563

User Classifier facility, 72

visualization, 562

visualization components, 565, 567

Winnow, 133–135

Balanced, 134–135

linear classification with, 133–135

updating of weights, 134

versions illustration, 134f

Wisdom, 38

Wrapper induction, 519–520

Wrapper method, 289–290

Wrappers, 519

XML (eXtensible Markup Language), 57, 568

XOR (exclusive-OR), 262, 263

XRFF format, 57

Zero-frequency problem, 178–179

ZeroR algorithm, 568, 569, 569

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data Mining

Create new playlist

Sign In

Sign Up

Table of Contents for
Data Mining