for the circle problem, 464
80–20 rule, 83
Σ, in math, 30
calculating, 62
fundamental limits of, 163
accuracy_score
, 62
AdaBoostClassifier
, 400, 403–406
additive model, 318
aggregation, 390
algorithms
analysis of, 72
genetic, 101
less important than data, 15
amoeba (StackExchange user), 465
analytic learning, 18
Anderson, Edgar, 56
ANOVAs test, 463
area under the curve (AUC), 177–178, 182–193, 202
arguments, of a function, 362
arithmetic mean, 170, see also average
assumptions, 55, 270, 282, 286–287, 439
average
computing from confusion matrix, 170
simple, 30
average centered dot product, see covariance
background knowledge, 322, 331, 439
bag of global visual words (BoGVW), 483, 488–490
bag of visual words (BoVW), 481–483
bagged classifiers
creating, 394
implementing, 407
basic algorithm for, 395
bias-variance in, 396
BaggingRegressor
, 407
base models
overfitting, 396
well-calibrated, 407
BaseEstimator
, 311
baseline methods, 159–161, 189, 191
baseline values, 356
basketball players, 397
Bayes optimal classifier, 464
in combined models, 390
number of, 148
bias-variance tradeoffs, 145–149, 154, 396
in decision trees, 249
in performance estimating, 382
big data, 71
Big-O analysis, 82
bigrams, 471
binary classification, 55, 174, 267
confusion matrix for, 164
binomials, 524
bivariate correlation, 415
black holes, models of, 467
body mass index (BMI), 322, 410–411
bootstrap aggregation, see bagging
Box, George, 69
C4.5, C5.0, CART algorithms, 244
calculated shortcut strategy, 100–101, 104
Calvinball game, 67
card games, 21
case-based reasoning, 18
categorical features, 5–7, 18, 346
categories, 332
Cauchy-Schwart inequality, 463
causality, 233
Celsius, converting to Fahrenheit, 325–326
center, see mean
classification_report
, 169–170
ClassifierMixin
, 202
classifiers
evaluating, 70–71, 159–203, 238–239
smart, 189
on subsets of features, 494
coefficient of determination, 130
coin flipping, 21
and binomials, 524
collections
, 20
combinations
, 41
combinatorics, 423
cost of increasing, 12
trading off for errors, 125–126, 295–301
complexity analysis, 82
compression, 13
computational learning theory, 15
computer graphics, 82
computer memory, see memory
computer science, 362
confounding factors, 233
confusion matrix, 164, 171–178
computing averages from, 168, 170
constant linear model, 146
contrast coding, 356
conveyor belt, 377
convolutional neural network, 516
corpus, 472
corrcoef
(NumPy), 416
correlation, 415–417, 423, 464
comparing, for different models, 127
of predictions, 56
CountVectorizer
, 473
between all pairs of features, 278
exploring graphically, 292
length-normalized, 463
not affected by data shifting, 451
covariance matrix (CM), 279–283, 451, 456
computing, 455
diagonal, 281
eigendecomposition of, 452, 456, 459
CRISP-DM process, 18
cross-validation (CV), 128–131
as a single learner, 230
comparing learners with, 154–155
extracting scores from, 192
feature engineering during, 323
minimum number of examples for, 152
with boosting, 403
wrapping methods inside, 370–372
cross_val_score
, 130, 132, 137, 196, 207, 379
Cumulative Response curve, 189
using kernels with, 461
data
accuracy of, 15
big, 71
centering, 221, 322, 325, 445–447, 451, 457
cleaning, 323
collecting, 14
converting to tabular, 470
fuzzy towards the tails, 88
geometric view of, 410
incomplete, 16
making assumptions about, 270
modeling, 14
more important than algorithms, 15
noisiness of, 15
nonlinear, 285
preparing, 14
preprocessing, 341
reducing, 250–252, 324–325, 461
standardized, 105, 221–225, 231, 315–316, 447
synthetic, 117
total amount of variation in, 451
transforming, see feature engineering
datasets
applying learners to, 394
examples in, 5
features in, 5
finding relationships in, 445
missing values in, 322
poorly represented classes in, 133
reducing, 449
single, distribution from, 390
testing, see testing datasets
training, see training datasets
datasets.load_boston
, 105, 234
datasets.load_breast_cancer
, 84, 203
datasets.load_digits
, 319
decision trees (DT), 239–249, 290–291, 464
bagged, 395
bias-variance tradeoffs in, 249
flexibility of, 313
prone to overfitting, 241
selecting features in, 325, 412
unique identifiers in, 241, 322
viewed as ensembles, 405
vs. random forests, 396
DecisionTreeClassifier
, 247
deep neural networks, 481
democratic legislature, 388
dependent variables, see targets
deployment, 14
Descartes, René, 170
diabetes dataset, 85, 105, 322, 416
diagonal covariance matrix, 281
Diagonal Linear Discriminant Analysis (DLDA), 282–285, 292
diagrams, drawing, 245
Dietterich, Tom, 375
Dijkstra, Edsger, 54
with PCA, 451
discontinuous target, 308
discriminant analysis (DA), 269–287, 290–292
as weights, 90
sum product of, 275
total, 94
binomial, 524
from a single dataset, 390
random, 369
domain knowledge, see background knowledge
dot
, 29–30, 38, 47–52, 245, 455
dot products, 29–30, 38, 47–52
advantages of, 43
and kernels, 438–441, 458–459, 461
average centered, see covariance
double cross strategy, 375
dual problem, solving, 459
dummy coding, see one-hot coding
dummy methods, see baseline methods
educated guesses, 71
eigendecomposition (EIGD), 452, 456, 458, 465–466
eigenvalues and eigenvectors, 456–457
Einstein, Albert, 124
ElasticNet
, 318
empirical loss, 125
enterprises, competitive advantages of, 16
entropy, 464
enumerate
, 494
errors
between predictions and reality, 350
in data collection process, 322
in measurements, 15, 142–143, 241
margin, 254
measuring, 33
negating, 207
positive, 33
sources of, 145
trading off for complexity, 125–126, 295–301
vs. residuals, 218
vs. score, 207
weighted, 399
estimated values, see predicted values
estimators, 66
Euclidean space, 466
deterministic, 142
events
probability distribution of, 25–27
examples, 5
dependent vs. independent, 391
distance between, 63–64, 438–439
duplicating by weight, 399
grouping together, 479
learning from, 4
quantity of, 15
relationships between, 434
supporting, 252
tricky vs. bad, 144
execution time, see time
extract-transform-load (ETL), 323
extrapolation, 71
extreme gradient boosting, 406
F1 calculation, 170
f_classif
, 422
factor analysis (FA), 466
choosing knob values for, 115, 144, 156, 337
stringing together, 377
with a side tray, 65
Fahrenheit, converting to Celsius, 325–326
failures, in a legal system, 12
fair bet, 259
false negative rate (FNR), 164–166
false positive rate (FPR), 164–166, 173–181
Fawcett, Tom, 18
feature construction, 322, 341–350, 410–411
how to perform, 324
limitations of, 377
feature selection, 322, 324–325, 410–428, 449
by importance, 425
formal statistics for, 463
integrating with a pipeline, 426–428
modelless, 464
feature-and-split
feature-pairwise Gram matrix, 464
features, 5
causing targets, 233
conditionally independent, 69
counterproductive, 322
covariant, 270
different, 63
numerical, 6–7, 18, 225, 343–344, 346
relationships between, 417
sets of, 423
standardizing, 85
Fisher’s Iris Dataset, see iris dataset
Fisher, Sir Ronald, 56
fit
, 224–225, 337, 363, 367–368, 371–372, 379, 381
fit-estimators, 66
fit_intercept
, 340
flash cards, 398
flashlights, messaging with, 417–418
flat surface, see planes
flipping coins, 21
and binomials, 524
fmin
, 500
folds, 128
forward stepwise selection, 463
fromiter
(NumPy), 494
full joint distribution, 148
functions
parameters of
vs. arguments, 362
functools
, 20
fundraising campaign, 189
future, predicting, 7
fuzzy specialist scenario, 405
gain curve, see Lift Versus Random curve
games
expected value of, 32
fair, 259
sets of rules for, 67
Gaussian Naive Bayes (GNB), 82, 282–287
genetic algorithms, 101
geometric mean, 170
get_support
, 413
Ghostbusters, 218
Glaton regression, 7
global visual words, 483, 487–490
good old-fashioned (GOF) linear regression, 300–301, 519–521
and complex problems, 307
gradient descent (GD), 101, 292
GradientBoostingClassifier
, 400, 403–406
Gram matrix, 464
graphics processing units (GPUs), 71, 82
greediness, for feature selection, 423–424
Hamming distance, 63
Hand and Till M method, 183–185, 197, 200, 202
harmonic mean, 170
Hettinger, Raymond, 54
hist
, 22
histogram
, 21
hold-out test set (HOT), 114–115
adjusting, 116
choosing, 359
cross-validation for, 371–377, 380–382
for tradeoffs between complexity and errors, 126
overfitting, 370
random combinations of, 368–370
hyperplanes, 39
IBM, 3
ID3 algorithm, 244
identification variables, 241, 322, 324
illusory correlations, 233
classification of, 9
import
, 19
in-sample evaluation, 60
independence, 23
independence assumptions, 148
independent component analysis (ICA), 466
independent variables, see features
indicator function, 243
inductive logic programming, 18
infinity-norm, 367
information gain, 325
information theory, 417
input features, 7
inputs, see features
avoiding, 356
International Standard of scientific abbreviations (SI), 73
iris dataset, 56-58, 60-61, 82, 133, 166–168, 174, 190–195, 242, 245, 329–332, 336, 480, 495
IsoMap, 462
iteratively reweighted least squares (IRLS), 291
jackknife resampling, 157
Jupyter notebooks, 19
k-Cross-Validation (CV), 129–131
with repeated train-test splits, 137
k-Means Clustering (k-MC), 479–481
k-Nearest Neighbors (k-NN), 64–67
algorithm of, 63
bias-variance for, 145
combining values from, 64
for nonlinear data, 285
performance of, 74–76, 78–81, 429–430
picking the best k, 113, 116, 154, 363–365
k-Nearest Neighbors classification (k-NN-C), 64
k-Nearest Neighbors regression (k-NN-R), 87–91
comparing to linear regression, 102–104, 147–229
evaluating, 221
vs. piecewise constant regression, 310
Kaggle website, 406
Keras, 82
kernel matrix, 438
kernel methods, 458
learners used with, 438
mock-up, 437
and dot products, 438–441, 458–459, 461
approximate vs. full, 436
feature construction with, 428–445
KNeighborsClassifier
, 66, 362–363
KNeighborsRegressor
, 91
Knuth, Donald, 83
kurtosis, 466
L1 regularization, see lasso regression
L2 regularization, see ridge regression
Lasso
, 300
lasso regression (L1), 300, 307
blending with ridge regression, 318
selecting features in, 325, 411, 424
learning algorithms, 8
in sklearn
, 157
learning methods
incremental/decremental, 130
nonparametric, 65
parameters of, 115
requiring normalization, 221
learning models, see models
choosing, 81
combining multiple, see ensembles
performance of, 102
overestimating, 109
tolerating mistakes in data, 16
used with kernel methods, 438
least-squares fitting, 101
leave-one-out cross-validation (LOOCV), 140–142
length-normalized covariance, 463
length-normalized dot product, 462–463
Levenshtein distance, 464
Lift Versus Random curve, 189, 193
limited capacity, 109–110, 117
limited resources, 187
linalg.svd
(NumPy), 455
line magic, 75
linear combination, 28
Linear Discriminant Analysis (LDA), 282–285, 495
linear regression (LR), 91–97, 305
bias of, 350
calculating predicted values with, 97, 265
comparing to k-NN-R, 102–104, 229
default metric for, 209
example of, 118
for nonlinear data, 285
good old-fashioned (GOF), 300–301, 307, 519–521
graphical presentation of, 504
performing, 97
selecting features in, 425
using standardized data for, 105
viewed as ensembles, 405
linear relationships, 415, 417
linearity, 285
LinearRegression
, 371
between classes, 250
drawing through points, 92, 237–238
finding the best, 98–101, 253, 268–269, 350, 410, 448–449, 457, 465
piecewise, 313
straight, 91
limited capacity of, 122
logistic regression (LogReg), 259–269, 287, 290–292
and loss, 526
calculating predicted values with, 265
for nonlinear data, 285
kernelized, 436
performance of, 429
solving perfectly separable classification problems with, 268–269
logreg_loss_01
, 507
lookup tables, 13
defining, 501
minimizing, 526
M method, 183–185, 197, 200, 202
machine learning and math, 19–20
definition of, 4
limits of, 15
running on GPUs, 82
macro
, 168
macro precision, 168
magical_minimum_finder
, 500–511
Mann-Whitney U statistic, 202
margin errors, 254
mathematics
1-based indexing in, 54
Σ notation, 30
derivatives, 526
eigenvalues and eigenvectors, 456–457
optimization, 500
parameters, 318
matrices, 456
breaking down, 457
decomposition (factorization), 452, 455
identity, 465
squaring, 466
transposing, 465
Matrix, The, 67
max_depth
, 242
maximum margin separator, 252
arithmetic, 170, see also average
definition of, 88
empirical, 457
for two variables, multiplying, 271
geometric, 170
harmonic, 170
multiple, for each train-test split, 231
mean absolute error (MAE), 209
mean squared error (MSE), 91, 101, 130, 209
measurements
accuracy of, 27
critical, 16
levels of, 18
overlapping, 410
computing on training data, 349
definition of, 88
predicting, 205
median absolute error, 209
medical diagnosis, 10
assessing correctness of, 11–12
for rare diseases, 160, 163, 178
memory
constraints of, 325
cost of, 71
relating to input size, 72
shared between programs, 76–77
testing usage of, 77–81, 102–104
memory_profiler
, 78
merge
, 334
methods
chaining, 166
metrics.accuracy_score
, 62
metrics.mean_squared_error
, 91
metrics.SCORERS.keys()
, 161–162, 208
micro
, 168
Minkowski distance, 63, 82, 367
MinMaxScaler
, 327
mistakes, see errors
Mitchell, Tom, 18
Moby Dick, 13
mode value, 446
additive, 318
building, 14
comparing, 14
concrete, 371
features working well with, 423–426, 464
fitting, 359–361, 363, 367, 370
fully defined, 371
not modifying the internal state of, 8, 361
performance of, 423
Monte Carlo, see randomness
Monte Carlo cross-validation, see repeated train-test splitting (RTTS)
Morse code, 417
multiclass learners, 179–185, 195–201
mutual information, 418–423, 464
minimizing, 466
bias-variance for, 148
in text classification, 69
performance of, 74–76, 78–81, 191
natural language processing (NLP), 9
nearest neighbors, see k-Nearest Neighbors
Nearest Shrunken Centroids (NSC), 292
NearestCentroids
, 292
nested cross-validation, 157, 370–377
Netflix, 117
newsgroups, 476
Newton’s Method, 292
No Free Lunch Theorem, 290
eliminating, 144
manipulating, 117
non-normality, 350
nonic, 120
nonlinearity, 285
nonparametric learning methods, 65
nonprimitive events, see compound events
normal distribution, 27, 520–524
normal equations, 101
normalization, 221, 322, 356, 474–476
Normalizer
, 475
np_array_fromiter
, 491–492, 494–495
np_cartesian_product
, 41
numbers
binary vs. decimal, 53
numerical features, 6–7, 18, 225, 343–344, 346
NumPy, 20
np.corrcoef
, 416
floating-point numbers in, 52–53
np.fromiter
, 494
np.histogram
, 21
np.linalg.svd
, 455
np.polyfit
, 119
np.random.randint
, 21
np.searchsorted
, 310
Nystroem kernel, 436
odds
one-hot coding, 333–341, 347, 356, 526
one-versus-all (OvA), 169
one-versus-one (OvO), 181–182, 253
one-versus-rest (OvR), 168, 179–182, 253, 267
OneHotEncoder
, 333
OpenCV library, 485
optimization, 156, 497–500, 526
premature, 83
ordinal regression, 18
outcome, outputs, see targets
and resampling, 128
overfitting, 117, 122–126, 290, 296
of base models, 396
pairplot
, 86
pandas
, 20
DataFrame
, 323
parabolas, 45
piecewise, 313
parameters, 115
adjusting, 116
choosing, 359
in computer science vs. math, 318
shuffling, 368
tuning, 362
vs. arguments, 362
Pareto principle, 83
partitions, 242
connecting sklearn
and, 347–348
documentation for, 356
PayPal, 189
peeking, 225
penalization, see complexity
percentile, 206
performance, 102
estimating, 382
measuring, 74–76, 78–81, 173, 178
overestimating, 109
physical laws, 17
piecewise constant regression, 309–313, 318
implementing, 310
preprocessing inputs in, 341
vs. k-NN-R, 310
PiecewiseConstantRegression
, 313
integrating feature selection with, 426–428
playing cards, 21
plus-one trick, 38, 43–45, 336, 521
polyfit
, 119
polynomial kernel, 253
polynomials
quadratic, 45
precision, 165
macro, 168
tradeoffs between recall and, 168, 170–173, 185–187, 202
precision-recall curve (PRC), 185–187, 202
predict
, 224–225, 379, 490–491
predictions, 165
flipping, 202
probability of, 170
real-world cost of, 56
predictive features, 7
predictive residuals, 219
predictors, see features
premature optimization, 83
presumption of innocence, 12
prime factorization, 452
principal components analysis (PCA), 445–462, 465–466
feature engineering in, 324
using dot products, 458–459, 461
probabilistic graphical models (PGMs), 516–525
and linear regression, 519–523
and logistic regression, 523–525
probabilistic principal components analysis (PPCA), 466
of primitive events, 22
processing time, see time
programs
bottlenecks in, 83
Provost, Foster, 18
purchasing behavior, predicting, 11
pydotplus
, 245
Pythagorean theorem, 63
Python
list comprehension in, 136
memory management in, 77
using modules in the book, 20
Quadratic Discriminant Analysis (QDA), 282–285
quadratic polynomials, see parabolas
quantile
, 206
for mean model, 229
misusing, 130
randint
, 369
comparing, 403
selecting features in, 425
random guess strategy, 98–99, 101
random sampling, 325
random.randint
, 21
RandomForestClassifier
, 425
RandomizedSearchCV
, 369
randomness, 16
affecting data, 143
for feature selection, 423
inherent in decisions, 241
pseudo-random, 139
to generate train-test splits, 133, 138–139
rbf
, 467
reality, 165
comparing to predictions, 215–217
recall, 165
tradeoffs between precision and, 168, 170–173, 185–187, 202
Receiver Operating Characteristic (ROC) curves, 172–181, 192, 202
and multiclass problem, 179–181
area under, 177–178, 182–193, 202
recentering, see data, centering
rectangles
areas of, 275
overlapping, 243
recursive feature elimination, 425–426
definition of, 85
ordinal, 18
RegressorMixin
, 311
regressors
default metric for, 209
performance of, 317
scoring for, 130
regularized linear regression, 296–301, 305
reinforcement learning, 18
repeated train-test splitting (RTTS), 133–139, 156
with replacement, 157, 391–392
without replacement, 391
rescaling, see scaling, standardizing
reshape
, 333
predictive, 219
Studentized, 232
resources
limited, 187
needed by an algorithm, 72
utilization in regression, 102–104
RFE
, 425
Ridge
, 300
ridge regression (L2), 300, 307
blending with lasso regression, 318
root mean squared error (RMSE), 101
calculating, 119
comparing regressors on, 315
high, 142
size of values in, 136
rvs
, 369
sampling, see resampling
statistical, 326
scipy.stats
, 369
extracting from CV classifiers, 192
for each class, 181
vs. loss, 207
scoring function, 184
Seaborn, 20
pairplot
, 86
tsplot
, 151
searchsorted
, 310
SelectPercentile
, 422
shrinkage, see complexity
shuffle
, 368
SIFT_create
, 485
signed area, 275
simple average, 30
simplicity, 124
singular value decomposition (SVD), 452, 465–466
baseline models in, 205
boosters in, 400
classification metrics in, 161–163, 208–209
classifiers in, 202
common interface of, 379
confusion matrix in, 173
consistency of, 225
cross-validation in, 129–130, 132, 184
custom models in, 311
distance calculators in, 64
documentation of, 368
feature correlation in, 416–417
feature evaluation in, 463
feature selection in, 425
learners in, 318
linear regression in, 300, 310
logistic regression in, 267
naming conventions in, 207, 362
normalization in, 356
plotting learning curves in, 157
sparse-aware methods in, 356
storing data in, 333
SVC in, 253
SVR in, 307
terminology of, 61, 66, 127, 160
text representation in, 471–479, 494
thresholds in, 176
using alternative systems instead, 119
using OvR, 253
skpre.Normalizer
, 495
Skynet, 389
smart step strategy, 99–101, 267
smoothness, 308, 406, see also complexity, regularization
sns.pairplot
, 58
softmax function, 526
sorted lists, 465
splines, 318
spread, see standard deviation
square root of the sum of squared errors, 93
squared error loss, 301
squared error points, 209
ss.geom
, 369
ss.normal
, 369
ss.uniform
, 369
StackExchange, 465
stacking, 390
StackOverflow, 292
standard deviation, 54, 85, 221, 327
standardization, 85, 105, 221–225, 231, 327
StandardScaler
, 223–225, 326–327
stationary learning tasks, 16
statistics, 87
coefficient of determination, 130, 209
distribution of the mean, 391
dummy coding, 334
for feature selection, 463
Studentized residuals, 232
variation in data, 451
documentation for, 356
Stochastic Gradient Descent (SGD), 267
stocks
choosing action for, 9
predicting pricing for, 11
storage space
measuring, 72
student performance, 195–201, 203, 225–226
comparing regressors on, 314–317
predicting, 10
Studentized residuals, 232
studying for a test, 109, 116–117
sum of probabilities of events
all primitive, 22
independent, 23
sum of squared errors (SSE), 33–34, 93–94, 210–212, 271, 301
smallest, 100
sum product, 30
summary statistic, 87
supervised learning from examples, 4, 9–11
Support Vector Classifiers (SVCs), 252–259, 290–291, 301, 442
boundary in, 252
computing, 291
maximum margin separator in, 305
performance of, 429
Support Vector Machines (SVMs), 252, 291, 442, 465
feature engineering in, 324
vs. the polynomial kernel, 437
Support Vector Regression (SVR), 301–307
main options for, 307
supporting examples, 252
T-distributed Stochastic Neighbor Embedding (TSNE), 462
t-test, 463
tabular data, 470
cooperative values of, 296
discontinuous, 308
predicting, 397
task understanding, 14
teaching to the test, 59–60, 114
protecting against, 110–111, 372, 377
TensorFlow, 82
term frequency-inverse document frequency (TF-IDF), 475–477, 495
testing datasets, 60–61, 110, 114
predicting on, 66
resampling, 128
testing phase, see assessment, selection
tests
positive vs. negative, 163–166
specificity of, 165
classification of, 69
representing as table rows, 470–471
TfidfVectorizer
, 475, 478, 495
Theano, 82
time
constraints of, 325
relating to input size, 72
time series, plotting, 151
Tolkien, J. R. R., 290
total distance, 94
tradeoffs, 13
between bias and variance, see bias-variance tradeoffs
between complexity and errors, 126
between false positives and negatives, 172
between precision and recall, 168, 170–173
train-test splits, 60, 110, 115
for cross-validation, 132
multiple, 128
randomly selected, 370
train_test_split
, 60, 70–71, 79, 349
training datasets, 60–61, 110, 114
duplicating examples by weight in, 399
fitting estimators on, 66
randomly selected, 370
resampling, 128
unique identifiers in, 241, 322
training error, 60
training phase, 113
treatment coding, see one-hot coding
tree-building algorithms, 244
trigrams, 471
true negative rate (TNR), 164–166
true positive rate (TPR), 164–166, 173–181
Trust Region Newton’s Method, 292
tsplot
, 151
Twenty Newsgroups dataset, 476
two-humped camel, see data, multimodal
unaccounted-for differences, 350
underfitting, 117, 122–125, 296
unigrams, 471
unique identifiers, 241, 322, 324
univariate feature selection, 415
unsupervised activities, 445
validation, 110, 156, see also cross-validation
validation sets (ValS), 114
randomly selected, 370
size of, 115
values
accuracy of, 15
actual, 33
baseline, 356
definition of, 5
explicit, vs. function parameters, 360–361
cooperative, 296
transforming, 350
under- vs. overestimating, 33
always positive, 272
not affected by data shifting, 451
VarianceThreshold
, 413
vectorizers, 495
verification, 156
vocabularies, 482
global, 487
votes, weighted, 390
VotingClassifier
, 407
warp functions, 440
weighted
errors, 399
votes, 390
weights
distributions of, 524
pairs of, 524
total size of, 297
whuber (StackOverflow user), 292
wine dataset, 412–414, 426–428, 449
Wittgenstein, Ludwig, 18
words
adjacent, 471
in a document, 471
visual, 491
World War II, 172
xgboost
, 406
z-scoring, see standardizing
zip
, 30
3.141.197.251