evaluating categorical metrics 329 – 333
evaluating continuous metrics 319 – 323
using alternative displays and tests 325 – 329
dangers of data silo 445 – 446
process over technology 442 – 445
training and inference skew 440 – 441
end user vs. internal use testing 453 – 460
fallbacks and cold starts 447 – 452
leaning heavily on prior art 448 – 450
model interpretability 460 – 469
Shapley additive explanations 461 – 463
using shap 463 – 466, 469
ACID-compliant storage layer 442
Agile software engineering 31 – 35
communication and cooperation 33 – 35
embracing and expecting change 35
algorithmic complexity 536 – 539
ALS (alternating least squares) 44
approximate Shapley value estimation 461 – 463
ARIMA, rapid testing for 184 – 186
interfacing with model registry 476 – 481
MLflow model registry 474 – 475
asynchronous concurrency 239 – 241
attribution measurement 302 – 316
clarifying correlation vs. causation 312 – 316
leveraging A/B testing for calculations 316 – 333
evaluating categorical metrics 329 – 333
evaluating continuous metrics 319 – 323
using alternative displays and tests 325 – 329
prediction performance 302 – 310
autoregressive parameters (p) variable 184
baseline comparison visualization 164 – 167
Becoming Agile in an Imperfect World (Smith and Sidky) 113
BI (business intelligence) style queries 194
algorithmic complexity for ML 536 – 539
analyzing decision-tree complexity 531 – 536
branch strategies, logging 234 – 236
bulk external delivery 500 – 502
delivery consistency 500 – 501
business rules chaos 116 – 120
cargo cult ML (machine learning) behavior 432 – 437
CDD (chaos-driven development) 115
CI/CD (continuous integration/continuous deployment) system 194
clean experimentation environment 540 – 541
CNNs (convolutional neural networks) 14
designing modular ML 257 – 264
handling errors right way 286 – 288
excessively nested logic 292 – 297
global mutable objects 288 – 291
encapsulation to prevent mutable side effects 290 – 291
naming conventions and structure 273 – 274
avoiding cargo cult ML (machine learning) behavior 432 – 437
monitoring everything in model life cycle 417 – 421
wireframing ML projects 426 – 431
setting guidelines in 163 – 172
baseline comparison visualization 164 – 167
collections, polynomial relationship and 524 – 529
communication 33 – 35, 76 – 123, 163
business rules chaos 116 – 120
project-based meetings 89 – 93
setting critical discussion boundaries 94 – 100
working with SMEs (subject-matter experts) 89
meeting with cross-functional teams 101 – 108
development progress reviews 105
experimental update meeting 102 – 103
preproduction review 107 – 108
SMEs (subject-matter experts) review/prototype review 103 – 104
setting limits on experimentation 108 – 116
CDD (chaos-driven development) 115
maintainability and extensibility 112 – 113
PDD (prayer-driven development) 114 – 115
RDD (resume-driven development) 115 – 116
TDD (test-driven development) or FDD (feature-driven development) 113
lightweight scripted style 357 – 361
asynchronous concurrency 239 – 241
constructIndexers() method 363
creating container-based pristine environment for experimentation 543 – 544
continuous integration/continuous deployment (CI/CD) system 194
cross-functional teams 101 – 108
development progress reviews 105
experimental update meeting 102 – 103
preproduction review 107 – 108
SMEs review/prototype review 103 – 104
collection for A/B testing 317 – 319
dangers of data silo 445 – 446
process over technology 442 – 445
training and inference skew 440 – 441
guiding principles for production code 401 – 412
checking data provenance 404 – 408
embedding data cleansing 410 – 412
quality 50 – 52
DataFrame functions module 359
co-opting principles of Agile software engineering 31 – 35
communication and cooperation 33 – 35
embracing and expecting change 35
foundation of ML (machine learning) engineering 35 – 37
increasing project success 27 – 29
Data Science, Classification, and Related Methods (Hayashi) 27
data warehouse, serving from 498
debugging walls of text 255 – 257
decision-trees, complexity 531 – 536
delivery consistency 500 – 501
feedback drift and law of diminishing returns 346 – 347
setting up environment 540 – 544
case for clean experimentation environment 540 – 541
containers to deal with dependency hell 542
creating container-based pristine environment for experimentation 543 – 544
DevOps (development operations) 31
discussion boundaries 94 – 100
post-experimentation phase 96 – 98
post-research phase discussion 95 – 96
docker pull continuumio/anaconda3 command 544
feedback drift and law of diminishing returns 346 – 347
drivers, handling tuning with SparkTrials 218 – 222
lightweight scripted style (imperative) 357 – 361
ER (entity-relationship) diagrams 409
estimating amount of work 73 – 74
handling errors right way 286 – 288
estimating amount of work 73 – 74
scoping research phase, importance of 68 – 73
experimental update meeting 102 – 103
experimentation 13 – 15, 64 – 74, 124 – 241
choosing tech for platform and team 215 – 227
handling tuning from driver with SparkTrials 218 – 222
handling tuning from workers with pandas_udf 222 – 226
using new paradigms for teams 226 – 227
estimating amount of work 73 – 74
CDD (chaos-driven development) 115
maintainability and extensibility 112 – 113
PDD (prayer-driven development) 114 – 115
RDD (resume-driven development) 115 – 116
TDD (test-driven development) or FDD (feature-driven development) 113
version control, branch strategies, and working with others 234 – 236
reading API documentation 130 – 135
possibilities, whittling down 190 – 196
evaluating prototypes properly 191 – 193
questions in planning session 193 – 196
moving from script to reusable code 146 – 153
performing data analysis 139 – 146
asynchronous concurrency 239 – 241
scoping research phase 68 – 73
running quick forecasting tests 172 – 190
setting guidelines in code 163 – 172
using Hyperopt to tune complex forecasting problem 208 – 214
explainable artificial intelligence (XAI) 460
ExponentialSmoothing() class 211
leaning heavily on prior art 448 – 450
FDD (feature-driven development) 113
features, monitoring 412 – 416
feature stores 441 – 442, 482 – 489
creating validation dataset 173 – 174
rapid testing for ARIMA 184 – 186
rapid testing of Holt-Winters exponential smoothing algorithm 186 – 190
rapid testing of VAR model approach 175 – 182
FP-growth (frequent-pattern-growth) market-basket analysis algorithms 95
frameworks, generalization and 379 – 381
GANs (generative adversarial networks) 15
GDPR (General Data Protection Regulation) 407
_generate_boundaries() method 265
generate_hyperopt_report() function 213
generate_log_map_and_plot() function 279
global mutable objects 288 – 291
encapsulation to prevent mutable side effects 290 – 291
hacking (cowboy development) 115
HIPAA (Health Insurance Portability and Accountability Act) 407
Holt-Winters exponential smoothing algorithm 186 – 190
HSD (honestly significant difference) tests 327
HWES (Holt-Winters Exponential Smoothing) model 208
TPEs (tree-structured Parzen estimators) 204 – 205
tuning complex forecasting problem 208 – 214
IDE (integrated development environment) 17
imperative scripted style 357 – 361
implementation, simplicity in 424 – 426
prediction serving architecture 497 – 499
Shapley additive explanations 461 – 463
approximate Shapley value estimation 461 – 463
using shap 463 – 466, 469
lightweight scripted style 357 – 361
linear relationship algorithm 521 – 523
version control, branch strategies, and working with others 234 – 236
LSTM (long short-term memory) 144
mad scientist developers 375 – 377
microbatch streaming 502 – 503
microservice framework 498 – 499
algorithmic complexity for 536 – 539
dangers of open source 390 – 392
generalization and frameworks 379 – 381
optimizing too early 382 – 390
technology-driven development vs. solution-driven development 393 – 395
unintentional obfuscation 364 – 379
ML (machine learning) engineering 3 – 25
planning 8 – 10
data science and foundation of 35 – 37
leveraging A/B testing for attribution calculations 316 – 333
evaluating categorical metrics 329 – 333
evaluating continuous metrics 319 – 323
using alternative displays and tests 325 – 329
measuring model attribution 302 – 316
clarifying correlation vs. causation 312 – 316
prediction performance 302 – 310
debugging walls of text 255 – 257
designing modular ML code 257 – 264
using test-driven development for ML 264 – 267
everything in model life cycle 417 – 421
moving average (q) variable 184
mutable objects, global 288 – 291
encapsulation to prevent mutable side effects 290 – 291
MVP review 98 – 99, 106 – 107
Natural Language Processing with Python (Bird, Klein, and Loper) 514
NDCG (non-discounted cumulative gain) metrics 45
mad scientist developers 375 – 377
troublesome coding habits 378 – 379
objective_function function 208
OLTP (online transaction processing) storage layer 441
p (autoregressive parameters) variable 184
partial autocorrelation test 154
PDD (prayer-driven development) 114 – 115
pdf (probability density function) 341, 345
phases, experimentation 66 – 67
PII (personally identifiable information) 407
planning 8 – 10, 38 – 60, 126 – 137
assumption of business knowledge 50
assumption of data quality 50 – 52
assumption of functionality 52
experimentation by solution building 58 – 60
reading API documentation 130 – 135
quick visualization of dataset 127 – 129
existing code used for project 195
getting predictions to end users 195
inference running location 195
running location for training 194
plot_predictions() function 213
pmf (probability mass function) 341, 345
polynomial relationship 524 – 529
post-experimentation phase 96 – 98
post-research phase discussion 95 – 96
prayer-driven development (PDD) 114 – 115
prediction performance 302, 308 – 310
prediction serving architecture 490 – 508
bulk external delivery 500 – 502
delivery consistency 500 – 501
determining serving needs 493 – 497
serving from database or data warehouse 498
serving from microservice framework 498 – 499
microbatch streaming 502 – 503
real-time server-side 503 – 507
burst volume and high volume 504 – 507
moving from script to reusable code 146 – 153
performing data analysis 139 – 146
preproduction review 100, 107 – 108
printing, logging and 232 – 234
print statements 170, 232 – 233, 256, 416
probability density function (pdf) 341, 345
process over technology 442 – 445
prediction serving architecture 490 – 508
avoiding cargo cult ML (machine learning) behavior 432 – 437
monitoring everything in model life cycle 417 – 421
wireframing ML (machine learning) projects 426 – 431
project-based meetings 89 – 93
q (moving average) variable 184
dangers of data silo 445 – 446
process over technology 442 – 445
training and inference skew 440 – 441
end user vs. internal use testing 453 – 460
fallbacks and cold starts 447 – 452
leaning heavily on prior art 448 – 450
model interpretability 460 – 469
Shapley additive explanations 461 – 463
using shap 463 – 466, 469
of Holt-Winters exponential smoothing algorithm 186 – 190
of VAR model approach 175 – 182
RDBMS (relational database management system) 137
RDD (resiliently distributed dataset) 223, 359
RDD (resume-driven development) 115 – 116
real-time server-side 503 – 507
burst volume and high volume 504 – 507
recurrent neural networks (RNNs) 144
REPL (read-eval-print loop) 128
researching 10 – 12, 126 – 130
scoping phase, importance of 68 – 73
visualization of dataset 127 – 129
returns, diminishing 346 – 347
RMSE (root mean squared error) 44
RNNs (recurrent neural networks) 144
ROI (return on investment) 473
algorithmic complexity for ML (machine learning) 536 – 539
analyzing decision-tree complexity 531 – 536
asynchronous concurrency 239 – 241
scoping 10 – 12
serving architecture, prediction 490 – 508
bulk external delivery 500 – 502
delivery consistency 500 – 501
determining serving needs 493 – 497
integrated models (edge deployment) 507 – 508
serving from database or data warehouse 498
serving from microservice framework 498 – 499
microbatch streaming 502 – 503
real-time server-side 503 – 507
burst volume and high volume 504 – 507
SGD (stochastic gradient descent) 536
Shapley additive explanations 461 – 463
approximate Shapley value estimation 461 – 463
shap package 461, 463, 466, 468 – 469, 475, 480
show-off type 373 – 375
in problem definitions 423 – 424
singular value decomposition (SVD) model 58
SLA, determining serving needs 494
solution-driven development 393 – 395
handling tuning from driver with SparkTrials 218 – 222
handling tuning from workers with pandas_udf 222 – 226
using new paradigms for teams 226 – 227
SPC (statistical process control) rules 345
SVD (singular value decomposition) model 58
TDD (test-driven development) 113, 264 – 267
technology, process over 442 – 445
technology-driven development 393 – 395
running quick forecasting tests 172 – 190
creating validation dataset 173 – 174
rapid testing for ARIMA 184 – 186
rapid testing of Holt-Winters exponential smoothing algorithm 186 – 190
rapid testing of VAR model approach 175 – 182
setting guidelines in code 163 – 172
baseline comparison visualization 164 – 167
TPEs (tree-structured Parzen estimators) 204 – 205
training, inference skew and 440 – 441
handling from driver with SparkTrials 218 – 222
handling from workers with pandas_udf 222 – 226
TPEs (tree-structured Parzen estimators) 204 – 205
using Hyperopt to tune complex forecasting problem 208 – 214
VectorAssembler constructor 360
visualization of dataset 127 – 129
VM (virtual machine) container 215
waterfall plots, shap 467 – 469
wireframing ML projects 426 – 431
workers, handling tuning with pandas_udf 222 – 226
XAI (explainable artificial intelligence) 460
18.188.227.92