Index

A

abstraction, 6, 8, 62, 63

accuracy, of linear regression, 230232

activation functions

hyperbolic tangent function, 278279

identity function, 276

rectified linear unit function, 277

sigmoid function. See sigmoid function

threshold/step function, 276

active learning

heuristics, 305306

query strategies, 306

AdaBoost. See adaptive boosting

ADALINE network model. See adaptive linear neural element network model

adaptive boosting, 86, 311

adaptive linear neural element (ADALINE) network model, 285286

agglomerative hierarchical clustering, 258259

AI. See artificial intelligence (AI)

Aibo, 10

Alpha error, 141

AlphaGo program, 2, 29

alternate hypothesis, 141

ANN. See artificial neural network (ANN)

anomaly checking, clustering, 244

anti-monotone property of support measure, 265

Apriori algorithm, for association rule learning, 264265, 309

Apriori principle rules,
265268

area of property, 217, 227

area under curve (AUC) value, 8081

artificial intelligence (AI), 1, 2, 243

artificial neural network (ANN), 273

adaptive linear neural element network model, 285286

backpropagation, 292296

competitive network, 289

direction of signal flow, 291

McCulloch–Pitts neural model, 279281

multi-layer feed forward network, 288289

number of layers, 290291

number of nodes in layers, 291

recurrent neural networks, 289290

Rosenblatt’s perceptron. See Rosenblatt’s perceptron

single-layer feed forward network, 287288

structure of, 275

weight of interconnection between neurons, 291292

artificial neurons, 273,
274275, 287

association analysis, 16,
242, 261

application of, 261

itemset, 262

support count, 262

association rule, 262264. See also association analysis

Apriori algorithm for, 264265

frequent itemset, 265, 267

strengths and weaknesses, 268

strong rules, 265, 267

association rule learning algorithm, 308309

attributes, 35, 3637

AUC value. See area under curve value

Auto MPG data set, 36, 326, 366

box plot of, 43

histogram, 4647

‘horsepower’, 3839

mean versus median for, 38

scatter plot, 51

autoencoders, 304305

axon, 274

B

backpropagation algorithm, 292296

backpropagation networks, 278, 294

backward phase, epochs, 292

backward stepwise selection, 232

bagging. See bootstrap aggregation

banking industry, machine learning in, 20

Bayes optimal classifier,
156157

Bayes rule. See Bayes’ theorem

Bayes’ theorem, 2, 121122, 150151, 158

concept learning and, 154157

likelihood, 152

posterior probability, 151152

prior knowledge, 151

Bayesian Belief network,
165166, 171

conditional independence, 166170

independence, 166170

in machine learning, 170

Bayesian classifiers, 149

Bayesian concept learning. See also Bayes’ theorem; Bayesian Belief network

brute-force Bayesian algorithm, 154156

consistent learners, 156

methods, 148, 149150

Naïve Bayes classifier. See Naïve Bayes classifiers

optimal classification, 156157

Bayesian interpretation, 119

Bernoulli distributions, 127

best linear unbiased estimator (BLUE), 229

best subset selection, 232

Beta error, 141

bias, 63

bias-variance trade-off, 7374

big data, 117

binary sigmoid function, 277278

binomial distribution, 127128

bioinformatics, 170, 209

biological neural network, 273, 274. See also artificial neural network (ANN)

biological neuron, 273274

biplot function, 342

bipolar sigmoid function, 278

bivariate random variables, 134135

BLUE. See best linear unbiased estimator (BLUE)

boosting, 86, 310311

bootstrap aggregation, 86, 310

bootstrap sampling, 70, 71, 335, 375376

box and whisker plot. See box plots

box plots, 4143, 369370

Auto MPG attributes, 43

cylinders, 4344

data exploration, 329330

displacement, 4445

model year, 45

origin, 44

branch and bound search, decision tree, 190191

branch node, 187

brute-force approach, 266

brute-force Bayesian learning algorithm, 154156

btissue data set, 343, 344

C

candidate hypothesis, 149, 152

capping, 54

caret package, 334, 347

categorical data, 33

exploring, 4749

nominal data, 3334

ordinal data, 34

categorical distribution, 129

cdf. See cumulative distribution function (cdf)

central limit theorem, 132, 138

central nervous system (CNS), 273274

central tendency, 3739

centroids, 249

chain rule, 120, 166

chi-square test, 234

class, 14, 178

class package, 346

classification algorithms, 180

decision tree. See decision tree

k-nearest neighbour, 181186

random forest model, 199201

support vector machines, 201209

classification learning steps, 179180

algorithm selection, 180

data pre-processing, 180

definition of training data set, 180

evaluation with test data set, 180

identification of required data, 179

problem identification, 179

training, 180

classification model, 66,
177178, 182

classification phase, bootstrap aggregation, 310

classification problem, 12

cluster centroids, recomputing, 250254

cluster package, 338, 349

clustering, 16, 242, 255, 377

anomaly checking, 244

customer segmentation, 243

data mining, 244

of data set, 251

different methods, 246247

external evaluation, 84

initial centroids, 252

internal evaluation, 8284

as machine learning task, 244246

partitioning-based. See partitioning-based clustering

text data mining, 243

CNS. See central nervous system (CNS)

competitive network, 289

Comprehensive R Archive Network (CRAN), 315

computational complexity, 306

concept learning, 150, 154157

conditional distributions, 136137

conditional independence, 166170

conditional probability, 120121, 165

confusion matrix, 76

confusionMatrix function, 336

consistent learners, 156

construct frequency table, 162

contains (), 324

contingency table. See two-way cross-tabulations

continuous numeric features, 164165

continuous random variables, 125126

mean and variance, 126

uniform distribution, 130131

converging connection, 169170

convex hull, 206

correlation, 137138

correlation-based similarity measure, 106

cosine similarity, 109, 110

cost function, 64

covariance, 137138

CPython, 21

CRAN. See Comprehensive R Archive Network (CRAN)

cross-tab. See two-way cross-tabulations

cross-validation, 71

cumulative distribution function (cdf), 123, 124, 126

cumulative probability, 161

curve linear negative slope, 220221

curve linear positive slope, 219220

customer segmentation, clustering, 243

D

data, 32, 3537

categorical, 32

collection, errors in, 53

dictionary, 35

interval, 34

nominal, 3334

ordinal, 34

qualitative, 3334

quantitative, 3435

ratio, 34

data dispersion, 3940

data exploration

data pre-processing, 332334

plots for, 329332, 368371

statistical functions for, 326329, 365368

data frame, 319

data handling commands, 323

data holdout, 374

data input, 6, 62

data manipulation commands, 324325

data matrix, 101

data mining, 244

data pre-processing, 56, 180, 332334, 372

capping of values, 373

dimensionality reduction, 56

feature subset selection, 5657

imputing standard values, 373

outliers and missing values, 372373

data quality, 53

data remediation, 53

handling outliers, 54

missing values, 5455

data set, 32, 150

Auto MPG, 36

features, 92

five-dimensional, 92

data spread, 39

data dispersion, 3940

data value position, 4041

data types

mathematical operations on, 322323

Python, 357358

R language, 318319

data value position, 4041

datasets, 369

DBSCAN, 260261

decision node, 187

decision theory, 140

decision tree, 14, 186187

algorithm for, 197

application, 198199

avoiding overfitting in, 197198

branch and bound search, 190191

building, 188190

entropy of, 191196

example, 188

exhaustive search, 190

information gain, 192197

output variable, 187

post-pruning, 197

pre-pruning, 197, 198

pruning of, 197

strengths, 198

structure, 187

types of nodes, 187

weaknesses, 198

decision tree classifier, 347, 387388

delta rule, 286

dendrites, 274

dendrogram, 258

density-based clustering, 260261

dependent variable, 216217, 222, 227229, 234

descriptive model, 16, 6667

digital neurons, 273

dimensionality reduction, 56, 232

discrete bivariate random variable, 135, 136

discrete distribution, 129

discrete random variable, 123125

distance-based clustering, 16

distance-based similarity measure, 106110

distribution function, 123

diverging connection, 169

divisive hierarchical clustering, 258259

document-term matrix, 98

double-sided exponential distribution, 134

dplyr package, 324, 339

dummy code categorical variables, 339340, 379

dummy encoding, 129

E

e1071 package, 345, 346

eager learning, 71

Eclat algorithm, 309

eigenvalues, 101, 102

eigenvectors, 101, 102

elastic net, 311

elbow method, 249

initial centroids, 249250

recomputing cluster centroids, 250254

elbow point, 249, 250

embedded approach, 112

encoding categorical (nominal) variables, 9597

encoding categorical (ordinal) variables, 97, 340, 380381

ends_with (), 324

ensemble learning algorithms, 309311

ensembling, 85, 86, 199

entropy, of decision tree, 191196

epochs, 292

backward phase, 292

forward phase, 292

error(s)

in data collection, 53

due to bias, 73

due to variance, 7375

error function. See cost function

error rate, 77

Euclidean distance, 106, 183, 250251, 307

Euclidean space, 100

evaluation criterion, 110

exclusive-OR (XOR) circuit, 279

exhaustive search, decision tree, 190

expected error reduction, 306

expected model change, 306

expert system, 11

F

factor, 319

feature, 92

distance measures between, 108

entropy, 106

n-dimensional data set, 107

feature construction, 93, 9495

dummy coding categorical (nominal) variables, 339340

encoding categorical (nominal) variables, 9597

encoding categorical (ordinal) variables, 97, 340

text-specific feature construction, 9799

transforming numeric (continuous) features, 97, 341

feature discovery, 93

feature engineering, 93

feature extraction. See feature extraction

feature subset selection. See feature subset selection

feature extraction, 93, 99

linear discriminant analysis, 102, 343344

principal component analysis, 100101, 341342

singular value decomposition, 101102, 342343

feature redundancy, 105110

feature relevance, 104106

feature selection. See feature subset selection

feature subset selection, 5657, 93, 102, 344345

approaches, 111112

feature redundancy, 105110

feature relevance, 104106

high-dimensional data, 103104

process, 110111

feature transformation, 93

feature construction. See feature construction

feature extraction. See feature extraction

feature vectors, 100

feed forward, 287

filter approach, 111

F-measure, 7980

for loop, 320

forward phase, epochs, 292

forward stepwise selection, 232

for–while loops, Python, 358359

foundation rules, 119120

fraud detection, 29

frequency table, 161, 162

frequent itemset, 265

frequentist interpretation, 119

FSelector package, 344

full batch gradient descent, 295

G

Gaussian (normal) distribution, 131133

Gaussian function, 307

Gaussian radial filter, 308

Gaussian RBF kernel, 208

Gauss–Markov theorem, 229

GBM. See gradient boosting machines (GBM)

generalization, 6, 9, 62

generation versus recognition, 303

ggplot2 library functions, 329

glial cells, 274

Go board game, 2

Google, 2, 29

Google Brain, 2, 29

GPU. See graphics processing unit (GPU)

gradient boosting machines (GBM), 311

gradient descent, 292

graphics processing unit (GPU), 296

H

Hamming distance, 107

healthcare, machine learning in, 21

heteroskedasticity, 229230

hierarchical clustering, 258

agglomerative, 258259

dendrogram representation, 260

distance measure, 259260

divisive, 258259

high-dimensional data set, 103104

histogram, 4547, 331,
370371

Auto MPG attributes, 4647

box plot and, 45

shapes, 46

holdout method, 6768, 334

homogeneous group, 246

horsepower attribute, 3839, 55, 328

human detection, 62

human learning, 4, 7, 62

under expert guidance, 45

knowledge gained from experts, 5

by self, 5

types of, 45

hybrid approach, 112

hybrid recommender system, 163164

hyperbolic tangent function, 278279

hyperplane, 201, 202203

hypothesis testing, 140142

I

IBM, 1, 2, 22, 28, 177

ICA. See independent component analysis (ICA)

ICU. See intensive care unit (ICU)

identification of required data, 179

identity function, 276

if-else statement, 320321, 359

imputation, 54, 55

incorrect sample set selection, 53

incremental gradient descent, 295

independence, 166170. See also conditional independence

independent component analysis (ICA), 303304

independent variables, 216, 227230

information gain, of decision tree, 192197

initial centroids, 249250

instance-based learning, 306308

insurance industry, machine learning in, 20

intensive care unit (ICU), 176

intercept, interpretation of, 224225

interdependent, 118

internal node, 187

interval data, 34

iris data set, 15, 329330, 342, 369371

irrelevant variables, 231

itemset, 262, 265266

J

Jaccard distance, 107

Jaccard index/coefficient, 107108

joint cumulative distribution function, 135

joint distribution, 120, 135

joint probability, 120, 165, 166, 167

joint probability density functions, 136

joint probability mass functions, 135136

Julia programming language, 23

K

Kappa value, 77

kernel trick, 207208

kernels, 207

k-fold cross-validation method, 6870, 335, 374375

k-means algorithm, 67, 247255, 349

appropriate number of clusters, 249

elbow method, 249254

strengths and weaknesses, 248

k-medoids algorithm, 255257

k-nearest neighbour (kNN) algorithm, 14

application, 186

category of lazy learner, 185186

Euclidean distance, 183

strengths, 186

student data set, 181183

weaknesses, 186

k-nearest neighbour (kNN) classifier, 346, 386387

kNN algorithm. See k-nearest neighbour (kNN) algorithm

knowledge, 118

knowledge discovery, 16

L

L1 norm, 107

L2 norm, 107

lab schedule, machine learning in, 353354

label, 12, 68, 176

labelled input data, 182

labelled training data, 12, 176

Laplace distribution, 134

lasso regression, 231

layers, neural network, 290291

lazy learning, 71

LDA. See linear discriminant analysis (LDA)

leaf node, 187

learning algorithm, 180

learning process of machines, 6162

learning rate, 296

least mean square (LMS), 286

least squares method, 2

leave-one-out cross-validation (LOOCV), 68, 70

level of significance, 141

likelihood, 152, 170

linear discriminant analysis (LDA), 102, 343344, 384385

linear kernel, 208

linear negative slope, 220

linear positive slope, 219

linear regression model, improving accuracy of, 230

dimensionality reduction, 232

shrinkage (regularization) approach, 231

subset selection, 231232

linearly separable data, 206

list, 319

LMS. See least mean square (LMS)

logistic regression, 233236

logit regression. See logistic regression

LOOCV. See leave-one-out cross-validation (LOOCV)

loops, 320321

loss function, 64

M

machine learning (ML), 1, 7, 29

abstraction, 6, 8

activities, 3032

algorithms, 14

applications of, 2021

in banking industry, 20

data. See data

data input, 6, 8

definition, 56

evolution of, 23

formalism, 9

foundation of, 2

generalization, 6, 9

in healthcare, 21

in insurance industry, 21

issues, 23

languages/tools, 2123

problem solving, 910

problems not using, 20

process, 6

reinforcement learning, 1718, 19

supervised learning, 1115, 19

types, 1120

unsupervised learning, 1617, 19

Manhattan distance, 107

MAP hypothesis. See maximum a posteriori (MAP) hypothesis

margin, 203

marginal distribution, 120

market basket analysis, 261

Markov chain Monte Carlo (MCMC), 142

MASS package, 343

matches (), 324

mathematical operations on data types, 322323

MATLAB (matrix laboratory), 22

matplotlib, 368

matrix, 318319

maximum a posteriori (MAP) hypothesis, 152, 156, 171

maximum likelihood estimation (MLE), 236

maximum likelihood (ML) hypothesis, 152

maximum margin hyperplane (MMH), 205

linearly separable data, 206

non-linearly separable data, 207

support vectors, 206

maximum point of curves, 226227

McCulloch–Pitts neural model, 279281

MCMC. See Markov chain Monte Carlo

mean, 37, 38

mean of random variable, 126, 128, 131

median, 37, 38

memory-based learning, 306308

merger points, clusters, 258

minimum marginal hyperplane (MMH), 306

minimum point of curves, 226227

Minkowski distance, 107

missing values, 54

estimating, 55

imputing, 55

records elimination, 54

mixed bivariate random variable, 135

ML. See machine
learning (ML)

MLE. See maximum likelihood estimation (MLE)

MMH. See maximum margin hyperplane (MMH); minimum marginal hyperplane (MMH)

mode, 34, 49, 55

model, 8

classification, 7576

definition, 63

descriptive, 6667

evaluating performance of, 7584

improving performance of, 8586

predictive, 6566

representation and interpretability, 7275

selecting, 6367

sensitivity of, 78

specificity of, 7879

training, 6772

model accuracy, 76

model parameter tuning, 85

model training, 63

bootstrap sampling, 335

classification, supervised learning, 336337

clustering, 338339

holdout, 334

k-fold cross-validation, 335

regression, supervised learning, 337338

Monte Carlo approximation, 142

Monte Carlo integration, 142

multi layer feed forward network, 394

multicollinearity, 229

multi-layer feed forward network, 288289, 292

multi-layer feedforward neural network, 350, 352

multi-layer perceptron,
284285

multinomial distribution, 128129

multinoulli distribution,
128129

multiple linear regression,
227228, 349

heteroskedasticity, 229230

multicollinearity, 229

multiple random variables

bivariate random variables, 134135

conditional distributions, 136137

covariance and correlation, 137138

joint distribution functions, 135

joint probability density functions, 136

joint probability mass functions, 135136

mutate function, 339

mutual information, 105

N

Naïve Bayes classifiers, 171, 346, 386

applications, 163164

assumption, 167

benefits, 159

continuous numeric features, 164165

parameter estimation
for, 158

principles, 158

problem with, 161163

steps, 161

strengths and weaknesses, 159, 160

training data for, 160

naiveBayes function, 346

n-dimensional data set, 92

nerve cell, 273

nervous system, 273

nesting functions, 324

network security, 149

neural network, 302, 392395. See also artificial neural network (ANN)

multi-layer feedforward, 350, 352

single-layer feedforward, 350, 351

neuralnet function, 350

neurolab, 392

neurons, 273, 274

‘No Free Lunch’ theorem, 65

nodes in layers, 291

noise-free training data, 156

nominal data, 3334

non-linearly separable
data, 207

normal random variable, 131133

null hypothesis, 141

numerical data, 34

box plots, 4145

central tendency, 3739

data dispersion, 3940

data spread, 39

data value position,
4041

exploring, 3741

histogram, 4547

interval data, 34

plotting, 4147

ratio data, 3435

numpy library, 367

O

objective function, 64

OLS. See ordinary least squares (OLS)

one-hot encoding, 129

one_of (), 324

online sentiment analysis, 164

OOB error. See out-of-bag (OOB) error

ordinal data, 34

ordinary least squares (OLS), 223, 226

outliers, 53, 54

out-of-bag (OOB) error, 200

overfitting, 73, 197198

P

PAM algorithm. See partitioning around medoids (PAM) algorithm

pandas library, 361, 378

partial regression coefficients, 227

partitioning around medoids (PAM) algorithm, 256257

partitioning-based clustering, 247

k-means algorithm, 247255

k-medoids algorithm,
255257

pattern discovery, 16

patterns, 15

PCA. See principal component analysis (PCA)

pdf. See probability density function (pdf)

Pearson correlation coefficient, 106

peripheral nervous system, 273

piping, 324

plyr package, 324

pmf. See probability mass function (pmf)

Poisson distribution, 129

polynomial kernel, 208

polynomial regression model, 232233

posterior probability, 151152, 154, 155, 156, 158, 171

prcomp function, 341

precision, 79

prediction, 230

predictive models, 6566

predictors, 216

preparation, machine learning system, 30

price of property, 217, 227

principal component analysis (PCA), 100101, 303, 341342, 381383

prior knowledge, 151, 165

prior probability, 158, 170

probabilistic classifications, 158

probabilistic inference process, 170

probability

posterior, 151152, 154156

prior, 151, 158, 170

revised, 168

rules, 148

unconditional, 168

probability density function (pdf), 125, 126

probability mass function (pmf), 123, 124, 127

probability rule, 151, 162

probability theory, 117

Bayes rule, 121122

Bayesian interpretation, 119

central limit theorem, 138

chain rule, 120

concept, 118119

conditional, 120121

of correct decisions, 142

foundation rules, 119120

frequentist interpretation, 119

hypothesis testing, 140142

joint, 120

Monte Carlo approximation, 142

random variables. See random variables

sampling distributions, 138140

sum rule, 120, 125

type I and type II errors, 141

union of two events, 120

problem identification, 179

product rule, 120

pruning of decision tree, 197

purity, cluster algorithms, 84

Python, 2122

basic commands, 355357

bootstrap sampling, 375376

classification model, 376377

clustering, 377

data exploration. See data exploration

data handling commands, 361365

data holdout, 374

data pre-processing, 372373

data types, 357358

feature construction,
378381

feature extraction, 381385

feature subset selection, 385

for–while loops, 358359

if–else statement, 359

k-fold cross-validation, 374375

machine learning lab using, 396397

mathematical operations, 360361

model training, 376

neural network, 392395

purity, 377

regression model, 377

scripts, 356

sklearn framework, 376

supervised learning. See supervised learning

variables, 357358

writing functions, 359360

Python Anaconda, 355

Python Software Foundation, 21

Q

qualitative data. See categorical data

quantitative data. See numerical data

query by committee, 306

R

R language, 22

basic commands, 317318

boxplot, 329330

cluster, 338339

data exploration. See data exploration

data types, 318319

histogram, 331

installation, 315

loops, 320321

mathematical operations on data types, 322323

model training. See model training

modelling and evaluation, 334

scatterplot, 331332

scripts management, 316

writing code in, 316

writing functions, 321

radial basis function (RBF), 307308

radial basis function network (RFFN), 307

radial function, 308

random forest classifier,
347348, 388

random forest model, 85, 199

application, 201

out-of-bag error in, 200

strengths, 200201

weaknesses, 201

random numbers, 67

random sample, 139

random variables, 122

Bernoulli, 127

binomial, 127128

bivariate, 134135

continuous, 125126

discrete, 123125

domain of, 122

multinomial and multinoulli, 128129

multiple. See multiple random variables

normal, 131133

Poisson, 129

standard normal, 132, 133, 138

uniform, 130131

randomForest function, 347

ratio data, 34

RBF. See radial basis function

recall, 79

receiver operating characteristic (ROC) curve, 8081

recognition, generation versus, 303

record, 32

rectified linear unit (ReLU) function, 277

recurrent neural networks, 289290

recursive partitioning, 188

regression, 12, 1415, 377

assumptions, 228229

common algorithms, 217

example of, 216

logistic, 233235

maximum likelihood estimation, 236

multiple linear, 227230

polynomial regression model, 232233

simple linear. See simple linear regression

supervised learning, 8182

regularization algorithms, 311

reinforcement learning, 1718, 19

ReLU function. See rectified linear unit (ReLU) function

remove outliers, 54

repeated holdout, 68

representation learning, 301302

active learning. See Active learning

association rule learning algorithm, 308309

autoencoders, 304305

clustering forms, 305

ensemble learning algorithms, 309311

generation versus recognition, 303

independent component analysis, 303304

instance-based learning, 306308

multilayer perceptron, 303

regularization algorithms, 311

supervised neural networks, 303

triangle types, 302

residual, 82, 222

revised probability, 168

RFFN. See radial basis function network (RFFN)

ridge regression, 231

risk prediction, 29

ROC curve. See receiver operating characteristic (ROC) curve

root node, 187

Rosenblatt’s perceptron, 281282

class assignment, 283

class separability, 284

classification by decision boundary, 283

classification with two decision lines, 285

decision boundary, 282

multi-layer perceptron, 284285

rpart package, 347

R-squared, 82, 377

rule of total probability, 120

S

sampling distributions, 138140

mean and variance, 140

with replacement, 139

without replacement, 139

sampling theory, 70

SAS. See Statistical Analysis System (SAS)

scatter plot, 4951, 331332, 371

scikit-learn, 22, 373, 374

scripts management, in R language, 316

semi-supervised learning, 176, 305

sensitivity of model, 78

serial connection, 169

set.seed function, 335

Shannon’s formula, 106

shrinkage (regularization) approach, 231

Sibyl, 29

sigmoid function, 277

binary, 277278

bipolar, 278

sigmoid kernel, 208

signal flow direction, neural network, 291

silhouette coefficient, 83

silhouette width, 8384, 378, 390

simple hypothesis, 141

simple linear regression,
217218, 349

error in, 221

example, 221225

maximum and minimum point of curves, 226227

no relationship graph, 221

ordinary least squares algorithm, 226

slopes, 218221

simple matching coefficient (SMC), 108109

Simple Random Sampling with Replacement (SRSWR), 70

single-layer feed forward network, 287288

single-layer feedforward neural network, 350, 351

single-valued real function, 122

singular value decomposition (SVD), 101102, 342343, 383384

slopes, linear regression model, 218219

curve linear negative slope, 220221

curve linear positive slope, 219220

linear negative slope, 220

linear positive slope, 219

SMC. See simple matching coefficient (SMC)

soma, 274

spam filtering, 163

specificity of model, 7879

spine.csv, 338

spinem.csv, 338

split, clusters, 258

SPSS. See Statistical Package for the Social Sciences (SPSS)

Spyder (Scientific PYthon Development EnviRonment), 355

squares of the errors
(SSE), 223

squashing function. See threshold activation function

SRSWR. See Simple Random Sampling with Replacement (SRSWR)

state space, 124

Statistical Analysis System (SAS), 22

Statistical Package for the Social Sciences (SPSS), 2223

stats package, 341, 342

step function, 276

stepwise subset selection method, 232

stochastic gradient descent, 295

stopping criterion, 111

strong rules, 265

subset generation, 110

subset selection, linear regression model,
231232, 385

backward stepwise, 232

best, 232

forward stepwise, 232

sum of squared error (SSE), 248, 249, 251, 256

sum rule, 120, 125

summary commands, 326

summary function, 337

summation junction, 275

supervised learning, 11, 19, 29, 176. See also unsupervised learning

bootstrap sampling, 7071

classification, 1314, 7581, 336337, 345

classification algorithms. See classification algorithms

classification learning steps. See classification learning steps

classification model, 177178, 376377, 386389

decision tree classifier, 347

example of, 12, 176177

holdout method, 6768

k-fold cross-validation method, 6870

kNN classifier, 346

lazy versus eager learner, 71

Naïve Bayes classifier, 346

random forest classifier, 347348

regression, 1415, 8182,
337338, 349, 389390

SVM classifier, 348

unsupervised learning versus, 176, 242

support count, 262

support vector machines (SVM), 201

application, 209

classification using hyperplanes, 201203

generalization error, 202

hard margin, 202

hyperplane, 201, 202203

identifying correct hyperplane, 203205

kernel trick, 207208

margin, 203

maximum margin hyperplane, 205207

strengths, 208

weaknesses, 208

support vectors, 202, 206

support-based pruning, 265

SVD. See singular value decomposition (SVD)

svd function, 342

SVM. See support vector machines (SVM)

SVM classifier, 348, 389

synapse, 274, 275

T

target function, 64

10-fold cross-validation, 68

term-document matrix, 98

test data, 13, 180, 182

text data mining, 243

text-based classification, 149, 163

text-specific feature construction, 9799

threshold activation function, 275

threshold function, 276

TID list. See Transaction IDs (TID list)

total probability rule, 120

train function, 347

training data, 12, 151, 154, 176, 180, 182

‘training data is labelled’, 176

training, learning algorithm, 180

training phase, bootstrap aggregation, 310

Transaction IDs (TID list), 309

transforming numeric (continuous) features, 97, 341

triangle types, 302

two-way cross-tabulations, 5152

type I error, 141

type II error, 141

U

uncertainty, 118

uncertainty sampling, 306

unconditional probability, 168

underfitting, 72

uniform distribution, 124, 125, 130131

unstructured data, 92

unsupervised learning, 1517, 19, 8284, 105, 241, 349350. See also supervised learning

application of, 242243

clustering, 338339, 377378, 390392

supervised learning versus, 242

V

validation, 111

validation data, 68

variable reduction, linear regression model, 232

variables, exploring relationship between, 4952

variance, errors due to, 7375

variance inflation factor (VIF), 229

variance of random variable, 126, 128, 131

variance reduction, 306

vector, 318

vector spaces, 100

vectorization process, 98

VIF. See variance inflation factor (VIF)

Voronoi diagram, 251

W

Waymo, 29

weight of interconnection between neurons, 291292

while loop, 320

wrapper approach, 111, 112

writing functions, 321

X

XOR circuit. See exclusive-OR (XOR) circuit

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.48.27