Index

Symbols

+1 trick, 38, 4345, 336, 521

1-NN model, 145, 154156

for the circle problem, 464

3-NN model, 66, 193

3D datasets, 460461

80–20 rule, 83

Σ, in math, 30

A

accuracy, 15, 27, 163

calculating, 62

fundamental limits of, 163

accuracy_score, 62

AdaBoost, 400, 405

AdaBoostClassifier, 400, 403406

additive model, 318

aggregation, 390

algorithms

analysis of, 72

genetic, 101

less important than data, 15

amoeba (StackExchange user), 465

analytic learning, 18

Anderson, Edgar, 56

ANOVAs test, 463

area under the curve (AUC), 177178, 182193, 202

arguments, of a function, 362

arithmetic mean, 170, see also average

array (NumPy), 276, 494

assessment, 113115

assumptions, 55, 270, 282, 286287, 439

attributes, 45

average

computing from confusion matrix, 170

simple, 30

weighted, 3132, 34, 89

average centered dot product, see covariance

B

background knowledge, 322, 331, 439

bag of global visual words (BoGVW), 483, 488490

bag of visual words (BoVW), 481483

transformer for, 491493

bag of words (BOW), 471473

normalizing, 474476

bagged classifiers

creating, 394

implementing, 407

bagging, 390, 394

basic algorithm for, 395

bias-variance in, 396

BaggingRegressor, 407

base models

overfitting, 396

well-calibrated, 407

BaseEstimator, 311

baseline methods, 159161, 189, 191

baseline regressors, 205207

baseline values, 356

basketball players, 397

Bayes optimal classifier, 464

betting odds, 259262

bias, 110, 144145, 292

addressing, 350351

in combined models, 390

in SVCs, 256259

number of, 148

reducing, 396, 400, 406

bias-variance tradeoffs, 145149, 154, 396

in decision trees, 249

in performance estimating, 382

big data, 71

Big-O analysis, 82

bigrams, 471

binary classification, 55, 174, 267

confusion matrix for, 164

binomials, 524

bivariate correlation, 415

black holes, models of, 467

body mass index (BMI), 322, 410411

boosting, 398401, 406

bootstrap aggregation, see bagging

bootstrap mean, 391393

bootstrapping, 157, 390394

Box, George, 69

C

C4.5, C5.0, CART algorithms, 244

calculated shortcut strategy, 100101, 104

Caltech101 dataset, 482483

Calvinball game, 67

card games, 21

rigged, 6869

case-based reasoning, 18

categorical coding, 332341

categorical features, 57, 18, 346

numerical values for, 8586

categories, 332

predicting, 910

Cauchy-Schwart inequality, 463

causality, 233

Celsius, converting to Fahrenheit, 325326

center, see mean

classification, 7, 5558

binary, 55, 164, 174, 267

nonlinear, 418419

classification_report, 169170

ClassifierMixin, 202

classifiers

baseline, 159161, 189, 191

comparing, 287290, 368

evaluating, 7071, 159203, 238239

making decisions, 5556

simple, 6381

smart, 189

closures, 382, 394

clustering, 18, 479481

on subsets of features, 494

coefficient of determination, 130

coin flipping, 21

and binomials, 524

increasing number of, 2527

collections, 20

collinearity, 340, 356

combinations, 41

combinatorics, 423

complexity, 124125

cost of increasing, 12

evaluating, 152154, 363365

manipulating, 119123

penalizing, 300, 306, 502

trading off for errors, 125126, 295301

complexity analysis, 82

compound events, 2223

compression, 13

computational learning theory, 15

computer graphics, 82

computer memory, see memory

computer science, 362

confounding factors, 233

confusion matrix, 164, 171178

computing averages from, 168, 170

constant, 160161

constant linear model, 146

constants, 3538

contrast coding, 356

conveyor belt, 377

convolutional neural network, 516

corpus, 472

corrcoef (NumPy), 416

correctness, 1112

correlation, 415417, 423, 464

squared, 415417

Cortez, Paulo, 195, 203

cosine similarity, 462463

cost, 126127

comparing, for different models, 127

lowering, 299, 497500

of predictions, 56

CountVectorizer, 473

covariance, 270292, 415417

between all pairs of features, 278

exploring graphically, 292

length-normalized, 463

not affected by data shifting, 451

visualizing, 275281

covariance matrix (CM), 279283, 451, 456

computing, 455

diagonal, 281

eigendecomposition of, 452, 456, 459

for multiple classes, 281282

CRISP-DM process, 18

cross-validation (CV), 128131

2-fold, 132133

3-fold, 128130

5-fold, 129130, 132

as a single learner, 230

comparing learners with, 154155

extracting scores from, 192

feature engineering during, 323

flat, 370371, 376

leave-one-out, 140142

minimum number of examples for, 152

nested, 157, 370377

on multiple metrics, 226229

with boosting, 403

wrapping methods inside, 370372

cross_val_predict, 192, 230

cross_val_score, 130, 132, 137, 196, 207, 379

Cumulative Response curve, 189

curves, 4547

using kernels with, 461

cut, 328, 330331

D

data

accuracy of, 15

big, 71

centering, 221, 322, 325, 445447, 451, 457

cleaning, 323

collecting, 14

converting to tabular, 470

fuzzy towards the tails, 88

geometric view of, 410

incomplete, 16

making assumptions about, 270

modeling, 14

more important than algorithms, 15

multimodal, 327, 357

noisiness of, 15

nonlinear, 285

preparing, 14

preprocessing, 341

reducing, 250252, 324325, 461

redundant, 324, 340, 411

scaling, 85, 221, 445, 447

sparse, 333, 356, 471, 473

standardized, 105, 221225, 231, 315316, 447

synthetic, 117

total amount of variation in, 451

transforming, see feature engineering

variance of, 143, 145, 445

weighted, 399400

DataFrame, 323, 363364

datasets

3D, 460461

applying learners to, 394

examples in, 5

features in, 5

finding relationships in, 445

missing values in, 322

multiple, 128, 156

poorly represented classes in, 133

reducing, 449

single, distribution from, 390

testing, see testing datasets

training, see training datasets

datasets.load_boston, 105, 234

datasets.load_breast_cancer, 84, 203

datasets.load_digits, 319

datasets.load_wine, 84, 203

decision stumps, 399, 401403

decision trees (DT), 239249, 290291, 464

bagged, 395

bias-variance tradeoffs in, 249

building, 244, 291

depth of, 241, 249

flexibility of, 313

for nonlinear data, 285286

performance of, 429430

prone to overfitting, 241

selecting features in, 325, 412

unique identifiers in, 241, 322

viewed as ensembles, 405

vs. random forests, 396

DecisionTreeClassifier, 247

decomposition, 452, 455

deep neural networks, 481

democratic legislature, 388

dependent variables, see targets

deployment, 14

Descartes, René, 170

design matrix, 336, 347

diabetes dataset, 85, 105, 322, 416

diagonal covariance matrix, 281

Diagonal Linear Discriminant Analysis (DLDA), 282285, 292

diagrams, drawing, 245

dice rolling, 2124

expected value of, 3132

rigged, 6869

Dietterich, Tom, 375

digits dataset, 287290, 401

Dijkstra, Edsger, 54

directions, 441, 445

finding the best, 449, 459

with PCA, 451

discontinuous target, 308

discretization, 329332

discriminant analysis (DA), 269287, 290292

performing, 283285

variations of, 270, 282285

distances, 6364

as weights, 90

sum product of, 275

total, 94

distractions, 109110, 117

distributions, 2527

binomial, 524

from a single dataset, 390

normal, 27, 520524

of the mean, 390391

random, 369

domain knowledge, see background knowledge

dot, 2930, 38, 4752, 245, 455

dot products, 2930, 38, 4752

advantages of, 43

and kernels, 438441, 458459, 461

average centered, see covariance

length-normalized, 462463

double cross strategy, 375

dual problem, solving, 459

dummy coding, see one-hot coding

dummy methods, see baseline methods

E

edit distance, 439, 464

educated guesses, 71

eigendecomposition (EIGD), 452, 456, 458, 465466

eigenvalues and eigenvectors, 456457

Einstein, Albert, 124

ElasticNet, 318

empirical loss, 125

ensembles, 387390

enterprises, competitive advantages of, 16

entropy, 464

enumerate, 494

enumerate_outer, 491492, 494

error plots, 215217

errors

between predictions and reality, 350

ignoring, 302305

in data collection process, 322

in measurements, 15, 142143, 241

margin, 254

measuring, 33

minimizing, 448449, 451

negating, 207

positive, 33

sources of, 145

trading off for complexity, 125126, 295301

vs. residuals, 218

vs. score, 207

weighted, 399

estimated values, see predicted values

estimators, 66

Euclidean distance, 63, 367

Euclidean space, 466

evaluation, 14, 62, 109157

deterministic, 142

events

compound vs. primitive, 2223

probability distribution of, 2527

random, 2122

examples, 5

dependent vs. independent, 391

distance between, 6364, 438439

duplicating by weight, 399

focusing on hard, 252, 398

grouping together, 479

learning from, 4

quantity of, 15

relationships between, 434

supporting, 252

tricky vs. bad, 144

execution time, see time

expected value, 3132

extract-transform-load (ETL), 323

extrapolation, 71

extreme gradient boosting, 406

extreme random forest, 397398

F

F1 calculation, 170

f_classif, 422

f_regression, 416417

Facebook, 109, 388

factor analysis (FA), 466

factorization, 452, 455

factory machines, 79, 114

choosing knob values for, 115, 144, 156, 337

stringing together, 377

testing, 110113

with a side tray, 65

Fahrenheit, converting to Celsius, 325326

failures, in a legal system, 12

fair bet, 259

false negative rate (FNR), 164166

false positive rate (FPR), 164166, 173181

Fawcett, Tom, 18

feature construction, 322, 341350, 410411

manual, 341343

with kernels, 428445

feature engineering, 321356

how to perform, 324

limitations of, 377

when to perform, 323324

feature extraction, 322, 470

feature selection, 322, 324325, 410428, 449

by importance, 425

formal statistics for, 463

greedy, 423424

integrating with a pipeline, 426428

model-based, 423426

modelless, 464

random, 396397, 423, 425

recursive, 425426

feature-and-split

finding the best, 244, 397

random, 397398

feature-pairwise Gram matrix, 464

feature_names, 413414

features, 5

categorical, 7, 346

causing targets, 233

conditionally independent, 69

correlation between, 415417

counterproductive, 322

covariant, 270

different, 63

evaluating, 462463

interactions between, 343348

irrelevant, 15, 241, 324, 411

number of, 146148

numerical, 67, 18, 225, 343344, 346

relationships between, 417

scaling, 322, 325329

scoring, 412415

sets of, 423

standardizing, 85

training vs. testing, 6061

transforming, 348353

useful, 15, 412

variance of, 412415

Fenner, Ethan, 237238

Fisher’s Iris Dataset, see iris dataset

Fisher, Sir Ronald, 56

fit, 224225, 337, 363, 367368, 371372, 379, 381

fit-estimators, 66

fit_intercept, 340

fit_transform, 326, 413

flash cards, 398

flashlights, messaging with, 417418

flat surface, see planes

flipping coins, 21

and binomials, 524

increasing number of, 2527

float, 5253

floating-point numbers, 5253

fmin, 500

folds, 128

forward stepwise selection, 463

fromiter (NumPy), 494

full joint distribution, 148

functions

parameters of

vs. arguments, 362

vs. values, 360361

wrapping, 361, 502

FunctionTransformer, 348349

functools, 20

fundraising campaign, 189

future, predicting, 7

fuzzy specialist scenario, 405

G

gain curve, see Lift Versus Random curve

games

expected value of, 32

fair, 259

sets of rules for, 67

Gaussian Naive Bayes (GNB), 82, 282287

generalization, 59, 126

genetic algorithms, 101

geometric mean, 170

get_support, 413

Ghostbusters, 218

Gini index, 202, 245, 464

Glaton regression, 7

global visual words, 483, 487490

good old-fashioned (GOF) linear regression, 300301, 519521

and complex problems, 307

gradient descent (GD), 101, 292

GradientBoostingClassifier, 400, 403406

Gram matrix, 464

graphics processing units (GPUs), 71, 82

greediness, for feature selection, 423424

GridSearch, 363, 368, 377, 382, 405, 427428

wrapped inside CV, 370372

GridSearchCV, 368, 371377

H

Hamming distance, 63

Hand and Till M method, 183185, 197, 200, 202

handwritten digits, 287290

harmonic mean, 170

Hettinger, Raymond, 54

hinge loss, 301305, 465

hist, 22

histogram, 21

hold-out test set (HOT), 114115

hyperparameters, 67, 115

adjusting, 116

choosing, 359

cross-validation for, 371377, 380382

evaluating, 363368

for tradeoffs between complexity and errors, 126

overfitting, 370

random combinations of, 368370

tuning, 362369, 380382

hyperplanes, 39

I

IBM, 3

ID3 algorithm, 244

identification variables, 241, 322, 324

identity matrix, 456, 465

illusory correlations, 233

images, 481493

BoVW transformer for, 491493

classification of, 9

describing, 488490

predicting, 490491

processing, 485487

import, 19

in-sample evaluation, 60

independence, 23

independence assumptions, 148

independent component analysis (ICA), 466

independent variables, see features

indicator function, 243

inductive logic programming, 18

infinity-norm, 367

information gain, 325

information theory, 417

input features, 7

inputs, see features

intercept, 336341

avoiding, 356

International Standard of scientific abbreviations (SI), 73

iris dataset, 56-58, 60-61, 82, 133, 166168, 174, 190195, 242, 245, 329332, 336, 480, 495

IsoMap, 462

iteratively reweighted least squares (IRLS), 291

itertools, 20, 41

J

jackknife resampling, 157

jointplot, 524525

Jupyter notebooks, 19

K

k-Cross-Validation (CV), 129131

with repeated train-test splits, 137

k-Means Clustering (k-MC), 479481

k-Nearest Neighbors (k-NN), 6467

1-NN model, 145, 154156, 464

3-NN model, 66, 193

algorithm of, 63

bias-variance for, 145

building models, 6667, 91

combining values from, 64

evaluating, 7071

metrics for, 162163

for nonlinear data, 285

performance of, 7476, 7881, 429430

picking the best k, 113, 116, 154, 363365

k-Nearest Neighbors classification (k-NN-C), 64

k-Nearest Neighbors regression (k-NN-R), 8791

comparing to linear regression, 102104, 147229

evaluating, 221

vs. piecewise constant regression, 310

Kaggle website, 406

Karate Kid, The, 182, 250

Keras, 82

kernel matrix, 438

kernel methods, 458

automated, 437438

learners used with, 438

manual, 433437

mock-up, 437

kernels, 438445

and dot products, 438441, 458459, 461

approximate vs. full, 436

feature construction with, 428445

linear, 253, 438

polynomial, 253, 437

KFold, 139140, 368

KNeighborsClassifier, 66, 362363

KNeighborsRegressor, 91

knn_statistic, 394395

Knuth, Donald, 83

kurtosis, 466

L

L1 regularization, see lasso regression

L2 regularization, see ridge regression

label_binarize, 179, 183

Lasso, 300

lasso regression (L1), 300, 307

blending with ridge regression, 318

selecting features in, 325, 411, 424

learning algorithms, 8

learning curves, 131, 150152

in sklearn, 157

learning methods

incremental/decremental, 130

nonparametric, 65

parameters of, 115

requiring normalization, 221

learning models, see models

learning systems, 910

building, 1315, 366

choosing, 81

combining multiple, see ensembles

evaluating, 1113, 109157

from examples, 4, 911

performance of, 102

overestimating, 109

tolerating mistakes in data, 16

used with kernel methods, 438

learning_curve, 150152

least-squares fitting, 101

leave-one-out cross-validation (LOOCV), 140142

length-normalized covariance, 463

length-normalized dot product, 462463

Levenshtein distance, 464

liblinear, 291292

libsvm, 291, 443, 465

Lift Versus Random curve, 189, 193

limited capacity, 109110, 117

limited resources, 187

linalg.svd (NumPy), 455

line magic, 75

linear algebra, 452, 457, 465

linear combination, 28

Linear Discriminant Analysis (LDA), 282285, 495

linear kernel, 253, 438

linear regression (LR), 9197, 305

bias of, 350

bias-variance for, 146147

calculating predicted values with, 97, 265

comparing to k-NN-R, 102104, 229

complexity of, 119123

default metric for, 209

example of, 118

for nonlinear data, 285

from raw materials, 500504

good old-fashioned (GOF), 300301, 307, 519521

graphical presentation of, 504

performing, 97

piecewise, 309313

regularized, 296301

relating to k-NN, 147148

selecting features in, 425

using standardized data for, 105

viewed as ensembles, 405

linear relationships, 415, 417

linearity, 285

LinearRegression, 371

LinearSVC, 253, 291, 465

lines, 3439

between classes, 250

drawing through points, 92, 237238

finding the best, 98101, 253, 268269, 350, 410, 448449, 457, 465

piecewise, 313

sloped, 37, 9497

straight, 91

limited capacity of, 122

local visual words, 483488

extracting, 485487

finding synonyms for, 487488

log-odds, 259, 262266

predicting, 505508

logistic regression (LogReg), 259269, 287, 290292

and loss, 526

calculating predicted values with, 265

for nonlinear data, 285

from raw materials, 504509

kernelized, 436

performance of, 429

PGM view of, 523525

solving perfectly separable classification problems with, 268269

LogisticRegression, 267, 292

logreg_loss_01, 507

lookup tables, 13

loss, 125126, 295

defining, 501

hinge, 301305, 465

minimizing, 526

vs. score, 127, 207

M

M method, 183185, 197, 200, 202

machine learning and math, 1920

definition of, 4

limits of, 15

running on GPUs, 82

macro, 168

macro precision, 168

magical_minimum_finder, 500511

make_cost, 502503

make_scorer, 185, 196, 208

Manhattan distance, 82, 367

manifolds, 459462

differentiable, 466467

Mann-Whitney U statistic, 202

margin errors, 254

mathematics

1-based indexing in, 54

Σ notation, 30

derivatives, 526

eigenvalues and eigenvectors, 456457

linear algebra, 452, 457, 465

matrix algebra, 82, 465466

optimization, 500

parameters, 318

matplotlib, 20, 22, 222223

matrices, 456

breaking down, 457

decomposition (factorization), 452, 455

identity, 465

multiplication of, 82, 465

orthogonal, 465466

squaring, 466

transposing, 465

Matrix, The, 67

matshow, 275277

max_depth, 242

maximum margin separator, 252

mean, 54, 85, 271, 446

arithmetic, 170, see also average

bootstrap, 391393

computing, 390391, 395

definition of, 88

distribution of, 390391

empirical, 457

for two variables, multiplying, 271

geometric, 170

harmonic, 170

multiple, for each train-test split, 231

predicting, 147, 205

weighted, 8990

mean absolute error (MAE), 209

mean squared error (MSE), 91, 101, 130, 209

mean_squared_error, 91, 126

measurements

accuracy of, 27

critical, 16

errors in, 15, 142143, 241

levels of, 18

overlapping, 410

rescaling, 328, 414

scales of, 412414

median, 206, 446

computing on training data, 349

definition of, 88

predicting, 205

median absolute error, 209

medical diagnosis, 10

assessing correctness of, 1112

confusion matrix for, 165166

example of, 67

for rare diseases, 160, 163, 178

memory

constraints of, 325

cost of, 71

measuring, 12, 76

relating to input size, 72

shared between programs, 7677

testing usage of, 7781, 102104

memory_profiler, 78

merge, 334

meta level, 4, 17

methods

baseline, 159161

chaining, 166

metrics.accuracy_score, 62

metrics.mean_squared_error, 91

metrics.roc_curve, 174, 179

metrics.SCORERS.keys(), 161162, 208

micro, 168

Minkowski distance, 63, 82, 367

MinMaxScaler, 327

mistakes, see errors

Mitchell, Tom, 18

Moby Dick, 13

mode value, 446

models, 8, 66

additive, 318

bias of, 144145

building, 14

combining, 390398

comparing, 14

concrete, 371

evaluating, 14, 110

features working well with, 423426, 464

fitting, 359361, 363, 367, 370

fully defined, 371

keeping simple, 126, 295

not modifying the internal state of, 8, 361

performance of, 423

selecting, 113114, 361362

variability of, 144145

workflow template for, 67, 90

Monte Carlo, see randomness

Monte Carlo cross-validation, see repeated train-test splitting (RTTS)

Morse code, 417

most_frequent, 160161

multiclass learners, 179185, 195201

averaging, 168169

mutual information, 418423, 464

minimizing, 466

mutual_info_classif, 419, 421422

mutual_info_regression, 420421

N

Naive Bayes (NB), 6870, 292

bias-variance for, 148

evaluating, 7071

in text classification, 69

performance of, 7476, 7881, 191

natural language processing (NLP), 9

nearest neighbors, see k-Nearest Neighbors

Nearest Shrunken Centroids (NSC), 292

NearestCentroids, 292

negative outcome, 163164

nested cross-validation, 157, 370377

Netflix, 117

neural networks, 512516, 526

newsgroups, 476

Newton’s Method, 292

No Free Lunch Theorem, 290

noise, 15, 117

addressing, 350, 353356

capturing, 122, 124, 126

distracting, 109110, 296

eliminating, 144

manipulating, 117

non-normality, 350

nonic, 120

nonlinearity, 285

nonparametric learning methods, 65

nonprimitive events, see compound events

normal distribution, 27, 520524

normal equations, 101

normalization, 221, 322, 356, 474476

Normalizer, 475

np_array_fromiter, 491492, 494495

np_cartesian_product, 41

numbers

binary vs. decimal, 53

floating-point, 5253

numerical features, 67, 18, 225, 343344, 346

predicting, 1011

NumPy, 20

np.corrcoef, 416

floating-point numbers in, 5253

np.array, 276, 494

np.dot, 2930, 38, 4752

np.fromiter, 494

np.histogram, 21

np.linalg.svd, 455

np.polyfit, 119

np.random.randint, 21

np.searchsorted, 310

NuSVC, 253257, 291

Nystroem kernel, 436

O

Occam’s razor, 124, 284

odds

betting, 259262

probability of, 262266

one-hot coding, 333341, 347, 356, 526

one-versus-all (OvA), 169

one-versus-one (OvO), 181182, 253

one-versus-rest (OvR), 168, 179182, 253, 267

OneHotEncoder, 333

OpenCV library, 485

optimization, 156, 497500, 526

premature, 83

ordinal regression, 18

outcome, outputs, see targets

overconfidence, 109110

and resampling, 128

overfitting, 117, 122126, 290, 296

of base models, 396

P

pairplot, 86

pandas, 20

pd.cut, 328, 330331

DataFrame, 323

one-hot coding in, 333334

vs. sklearn, 323, 332

parabolas, 45

finding the best fit, 119123

piecewise, 313

parameters, 115

adjusting, 116

choosing, 359

in computer science vs. math, 318

shuffling, 368

tuning, 362

vs. arguments, 362

vs. explicit values, 360361

Pareto principle, 83

partitions, 242

patsy, 334340, 344347

connecting sklearn and, 347348

documentation for, 356

PayPal, 189

PCA, 449452

peeking, 225

penalization, see complexity

penalties, 300, 306, 502

percentile, 206

performance, 102

estimating, 382

evaluating, 131, 150152, 382

measuring, 7476, 7881, 173, 178

overestimating, 109

physical laws, 17

piecewise constant regression, 309313, 318

implementing, 310

preprocessing inputs in, 341

vs. k-NN-R, 310

PiecewiseConstantRegression, 313

Pipeline, 378379

pipelines, 224225, 377382

integrating feature selection with, 426428

plain linear model, 146, 147

planes, 3941

finding the best, 410, 457

playing cards, 21

plots, 40, 41

plus-one trick, 38, 4345, 336, 521

points in space, 3443, 82

polyfit, 119

polynomial kernel, 253

polynomials

degree of, 119, 124

quadratic, 45

positive outcome, 163164

precision, 165

macro, 168

tradeoffs between recall and, 168, 170173, 185187, 202

precision-recall curve (PRC), 185187, 202

predict, 224225, 379, 490491

predict_proba, 174175

predicted values, 1011, 33

calculating, 97, 265

prediction bar, 170177, 186

predictions, 165

combining, 389, 395, 405

evaluating, 215217

flipping, 202

probability of, 170

real-world cost of, 56

predictive features, 7

predictive residuals, 219

predictors, see features

premature optimization, 83

presumption of innocence, 12

prime factorization, 452

primitive events, 2223

principal components analysis (PCA), 445462, 465466

feature engineering in, 324

using dot products, 458459, 461

prior, 160161

probabilistic graphical models (PGMs), 516525

and linear regression, 519523

and logistic regression, 523525

probabilistic principal components analysis (PPCA), 466

probabilities, 2127

conditional, 24, 25

distribution of, 2527, 290

expected value of, 3132

of independent events, 23, 69

of primitive events, 22

of winning, 259266

processing time, see time

programs

bottlenecks in, 83

memory usage of, 7677

Provost, Foster, 18

purchasing behavior, predicting, 11

pydotplus, 245

pymc3, 519521

Pythagorean theorem, 63

Python

indexing semantics in, 21, 54

list comprehension in, 136

memory management in, 77

using modules in the book, 20

Q

Quadratic Discriminant Analysis (QDA), 282285

quadratic polynomials, see parabolas

quantile, 206

Quinlan, Ross, 239, 244

R

R2 metric, 209214

for mean model, 229

limitations of, 214, 233234

misusing, 130

randint, 369

random events, 2122

random forests (RFs), 396398

comparing, 403

extreme, 397398

selecting features in, 425

random guess strategy, 9899, 101

random sampling, 325

random step strategy, 99, 101

random.randint, 21

random_state, 139140

RandomForestClassifier, 425

RandomizedSearchCV, 369

randomness, 16

affecting data, 143

for feature selection, 423

for hyperparameters, 368370

inherent in decisions, 241

pseudo-random, 139

to generate train-test splits, 133, 138139

rare diseases, 160, 163, 178

rbf, 467

reality, 165

comparing to predictions, 215217

recall, 165

tradeoffs between precision and, 168, 170173, 185187, 202

Receiver Operating Characteristic (ROC) curves, 172181, 192, 202

and multiclass problem, 179181

area under, 177178, 182193, 202

binary, 174177

patterns in, 173174

recentering, see data, centering

rectangles

areas of, 275

drawing, 275278

overlapping, 243

recursive feature elimination, 425426

redundancy, 324, 340

regression, 7, 64, 85105

comparing methods of, 306307

definition of, 85

examples of, 1011

metrics for, 208214

ordinal, 18

regression trees, 313314

RegressorMixin, 311

regressors

baseline, 205207

comparing, 314317

default metric for, 209

evaluating, 205234

implementing, 311313

performance of, 317

scoring for, 130

regularization, 296301

performing, 300301

regularized linear regression, 296301, 305

reinforcement learning, 18

repeated train-test splitting (RTTS), 133139, 156

resampling, 128, 156, 390

with replacement, 157, 391392

without replacement, 391

rescaling, see scaling, standardizing

reshape, 333

residual plots, 217221, 232

residuals, 218, 230232, 350

predictive, 219

Studentized, 232

resources

consumption of, 1213, 71

limited, 187

measuring, 7177

needed by an algorithm, 72

utilization in regression, 102104

RFE, 425

Ridge, 300

ridge regression (L2), 300, 307

blending with lasso regression, 318

rolling dice, 2124

expected value of, 3132

rigged, 6869

root mean squared error (RMSE), 101

calculating, 119

comparing regressors on, 315

high, 142

size of values in, 136

rvs, 369

S

sampling, see resampling

Samuel, Arthur, 34, 17

scaling, 322, 325329

statistical, 326

scipy.stats, 369

scores, 127, 130

extracting from CV classifiers, 192

for each class, 181

vs. loss, 207

scoring function, 184

Seaborn, 20

pairplot, 86

tsplot, 151

searchsorted, 310

SelectFromModel, 424425

selection, 113114

SelectPercentile, 422

sensitivity, 173, 185

SGDClassifier, 267, 292

shrinkage, see complexity

shuffle, 368

ShuffleSplit, 137139

shuffling, 137140, 382

SIFT_create, 485

signed area, 275

Silva, Alice, 195, 203

similarity, 6364

simple average, 30

simplicity, 124

singular value decomposition (SVD), 452, 465466

sklearn, 1920

3D datasets in, 460461

baseline models in, 205

boosters in, 400

classification metrics in, 161163, 208209

classifiers in, 202

common interface of, 379

confusion matrix in, 173

connecting patsy and, 347348

consistency of, 225

cross-validation in, 129130, 132, 184

custom models in, 311

distance calculators in, 64

documentation of, 368

feature correlation in, 416417

feature evaluation in, 463

feature selection in, 425

kernels in, 435437, 481

learners in, 318

linear regression in, 300, 310

logistic regression in, 267

naming conventions in, 207, 362

normalization in, 356

PCA in, 449452

pipelines in, 224225

plotting learning curves in, 157

R2 in, 210214, 233234

random forests in, 396, 407

sparse-aware methods in, 356

storing data in, 333

SVC in, 253

SVR in, 307

terminology of, 61, 66, 127, 160

text representation in, 471479, 494

thresholds in, 176

using alternative systems instead, 119

using OvR, 253

vs. pandas, 323, 332

workflow in, 67, 90

skms.cross_validate, 226227

skpre.Normalizer, 495

Skynet, 389

smart step strategy, 99101, 267

smoothness, 308, 406, see also complexity, regularization

sns.pairplot, 58

softmax function, 526

sorted lists, 465

sparsity, 333, 356

specificity, 165, 173, 185

splines, 318

spread, see standard deviation

square root of the sum of squared errors, 93

squared error loss, 301

squared error points, 209

ss.geom, 369

ss.normal, 369

ss.uniform, 369

StackExchange, 465

stacking, 390

StackOverflow, 292

standard deviation, 54, 85, 221, 327

standardization, 85, 105, 221225, 231, 327

StandardScaler, 223225, 326327

stationary learning tasks, 16

statistics, 87

coefficient of determination, 130, 209

distribution of the mean, 391

dummy coding, 334

for feature selection, 463

Studentized residuals, 232

variation in data, 451

statsmodels, 292, 338341

documentation for, 356

Stochastic Gradient Descent (SGD), 267

stocks

choosing action for, 9

predicting pricing for, 11

stop words, 472473, 494

storage space

cost of, 1213, 71

measuring, 72

stratification, 132133

stratified, 160161

StratifiedKFold, 130, 403

strings, comparing, 438439

stripplots, 135, 155

student performance, 195201, 203, 225226

comparing regressors on, 314317

predicting, 10

Studentized residuals, 232

studying for a test, 109, 116117

sum, weighted, 28, 31

sum of probabilities of events

all primitive, 22

independent, 23

sum of squared errors (SSE), 3334, 9394, 210212, 271, 301

smallest, 100

sum of squares, 3233

sum product, 30

summary statistic, 87

supervised learning from examples, 4, 911

Support Vector Classifiers (SVCs), 252259, 290291, 301, 442

bias-variance in, 256259

boundary in, 252

computing, 291

for nonlinear data, 285287

maximum margin separator in, 305

parameters for, 254256

performance of, 429

Support Vector Machines (SVMs), 252, 291, 442, 465

feature engineering in, 324

from raw materials, 510511

vs. the polynomial kernel, 437

Support Vector Regression (SVR), 301307

main options for, 307

support vectors, 252, 254

supporting examples, 252

SVC, 253259, 291, 438

synonyms, 482483, 487488

T

T-distributed Stochastic Neighbor Embedding (TSNE), 462

t-test, 463

tabular data, 470

targets, 67

cooperative values of, 296

discontinuous, 308

predicting, 397

training vs. testing, 6061

transforming, 350, 353356

task understanding, 14

tax brackets, 322, 331

teaching to the test, 5960, 114

in picking a learner, 112113

protecting against, 110111, 372, 377

TensorFlow, 82

term frequency-inverse document frequency (TF-IDF), 475477, 495

testing datasets, 6061, 110, 114

predicting on, 66

resampling, 128

size of, 115, 130

testing phase, see assessment, selection

tests

positive vs. negative, 163166

specificity of, 165

text, 470479

classification of, 69

encoding, 471476

representing as table rows, 470471

TfidfVectorizer, 475, 478, 495

Theano, 82

time

constraints of, 325

cost of, 13, 71

measuring, 12, 72, 7475

relating to input size, 72

time series, plotting, 151

timeit, 7475, 83

todense, 333, 473

Tolkien, J. R. R., 290

total distance, 94

tradeoffs, 13

between bias and variance, see bias-variance tradeoffs

between complexity and errors, 126

between false positives and negatives, 172

between precision and recall, 168, 170173

train-test splits, 60, 110, 115

evaluating, 7071, 152

for cross-validation, 132

multiple, 128

randomly selected, 370

repeated, 133139, 156

train_test_split, 60, 7071, 79, 349

training datasets, 6061, 110, 114

duplicating examples by weight in, 399

fitting estimators on, 66

randomly selected, 370

resampling, 128

size of, 115, 130131, 150

unique identifiers in, 241, 322

training error, 60

training loss, 125126, 296

training phase, 113

transform, 224225

Transformer, 435436

TransformerMixin, 348, 379

transformers, 348350

for images, 491493

treatment coding, see one-hot coding

tree-building algorithms, 244

trigrams, 471

true negative rate (TNR), 164166

true positive rate (TPR), 164166, 173181

Trust Region Newton’s Method, 292

tsplot, 151

Twenty Newsgroups dataset, 476

two-humped camel, see data, multimodal

U

unaccounted-for differences, 350

underfitting, 117, 122125, 296

uniform, 160161

unigrams, 471

unique identifiers, 241, 322, 324

univariate feature selection, 415

unsupervised activities, 445

V

validation, 110, 156, see also cross-validation

validation sets (ValS), 114

randomly selected, 370

size of, 115

values

accuracy of, 15

actual, 33

baseline, 356

definition of, 5

discrete, 56

explicit, vs. function parameters, 360361

finding the best, 98101, 267

missing, 18, 322

numerical, 67, 18, 86, 225

predicting, 64, 85, 87, 91

predicted, 1011, 33, 97, 265

target, 67

cooperative, 296

transforming, 350

under- vs. overestimating, 33

variance, 110, 271, 292

always positive, 272

in feature values, 412415

in SVCs, 256259

maximizing, 448449, 451

not affected by data shifting, 451

of data, 143, 145, 445

of model, 144145

reducing, 396, 400, 406

VarianceThreshold, 413

vectorizers, 495

verification, 156

vocabularies, 482

global, 487

votes, weighted, 390

VotingClassifier, 407

W

warp functions, 440

weighted

average, 3132, 34, 89

data, 399400

errors, 399

mean, 8990

sum, 28, 31

votes, 390

weights

adjusting, 497500

distributions of, 524

pairs of, 524

restricting, 105, 146

total size of, 297

whuber (StackOverflow user), 292

wine dataset, 412414, 426428, 449

winning, odds of, 259262

Wittgenstein, Ludwig, 18

words

adjacent, 471

counts of, 471, 473

frequency of, 474476

in a document, 471

stop, 472473, 494

visual, 491

global, 483, 487490

local, 483488

World War II, 172

wrapping functions, 361, 502

X

xgboost, 406

xor function, 341343

Y

YouTube, 54, 109

Z

z-scoring, see standardizing

zip, 30

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.197.251