Symbols
632 bootstrap 172
A
accuracy 114
adaptive overfitting 150
adversarial testing 25
adversarial validation 160, 180
AI Ethics
reference link 74
characteristics 346
Alibaba Cloud 11
reference link 11
AlphaZero
reference link 431
Analytics competitions 19
reference link 11
Annuals competitions 18
attention 288
AUC metric 105
augmentation techniques
for image 335
for text 414
auto-correlation 166
autoencoders
AutoGluon 330
AutoViz
reference link 204
average precision 118
average precision at k (AP@K) 134
averaging ensembling technique 307-309
B
bagging technique 305
basic optimization techniques 242
basic techniques, text augmentation strategies
Bayesian optimization 261
extending, to neural architecture search 276-285
BayesSearchCV function 268
Becker, Dan 476
bivariate analysis 204
advantages 317
disadvantages 318
Bober-Irizar, Mikel 474
Boruta 222
BorutaShap 222
bottleneck DAEs 227
C
calibration function 143
CatBoost 257
reference link 213
URL 257
Chesler, Ryan 174
classification task
binary 100
metrics 114
multi-class problem 101
multi-label problem 101
tasks 100
CodaLab 11
reference link 11
Code competitions 22
coefficient of determination 110
Cohen Kappa score 124
column block 256
Common Objects in Context (COCO) 372
Common Task Framework (CTF) paradigm 23
Community competitions 19
competition types 17
Analytics competitions 19
Annuals 18
Code competitions 22
Community competitions 19
Featured 17
Getting Started competitions 18
Masters 18
Playground competitions 19
Recruitment competitions 18
Research 18
competitions page, Kaggle 12
code menu 14
data menu 14
overview menu 14
rules menu 14
complex stacking and blending solutions
computational resources 26
concept drift 184
conda
reference link 464
confusion matrix 115
connections
building, with competition data scientists 470, 471
Connect X 426
agents 427
board 427
reference link 426
Conort, Xavier 331
CookieCutter
reference link 464
correlation matrix 314
cost function 99
criticisms, Kaggle competitions 33, 34
cross-entropy 119
cross-validation strategy
cross_val_predict
reference link 171
CrowdANALYTIX 10
reference link 10
CTGAN 196
D
data
gathering 42
data augmentation 335
data leakage 187
data science competition platforms 4-6
Alibaba Cloud 11
Analytics Vidhya 11
CodaLab 11
CrowdANALYTIX 10
DrivenData 10
Kaggle competition platform 7
minor platforms 11
Numerai 10
Signate 10
Zindi 10
dataset
Data Version Control (DVC) 465
URL 201
deep neural networks (DNNs) 276
deep stack DAEs 228
denoising autoencoders (DAEs) 227, 229
bottleneck DAEs 227
deep stack DAEs 228
Detectron2 371
Dev.to 462
Dice coefficient 132
discussion forum, Kaggle
bookmarking 85
example discussion approaches 86-89
filtering 83
for competition page 84
Netiquette 92
DistilBERT 393
distributions of test data
distributions of training data
Doarakko
URL 466
Dockerfile 457
DrivenData 10
reference link 10
E
early stopping 354
EfficientNet B0 350
embeddings 404
empirical Bayesian approach 218
ensemble selection technique 320
averaging techniques 305
strategies 305
training cases, sampling 308
error function 100
evaluation metric 98
custom objective functions 136-139
optimizing 135
exploratory data analysis (EDA) 20, 453
dimensionality reduction, with t-SNE 205-207
dimensionality reduction, with UMAP 205-207
Extra Trees 252
F
F1 score 119
fast.ai
reference link 8
fastpages 462
F-beta score 119
Featured competitions 17
feature engineering, techniques
feature importance, using to evaluate work 220-222
meta-features, based on columns 213, 214
meta-features, based on rows 213, 214
feature leakage 188
Fink, Laura 384
forking 57
FreeCodeCamp 462
G
Game AI
reference link 73
geometric mean 312
Getting Started competitions 18
examples 18
Gists 463
GitHub
Global Wheat Detection competition
reference link 357
Goal-Impact-Challenges-Finding method 484
Google Cloud Platform (GCP)
Google Colab 465
reference link 49
Google Landmark Recognition 2020
reference link 18
Gonen, Firat 442
gradient boosting 305
gradient tree boosting 253
Gradio
URL 465
GroupKFold
reference link 166
H
Hacker Noon 461
board 440
reference link 439
harmonic mean 312
Heroku app
URL 466
hyperparameters 258
HuggingFace Spaces
URL 465
hyperband optimization 285
Hyperopt 296
I
image classification
ImageDataGenerator approach 341-344
imitation learning 441
independent and identically distributed 152
instance segmentation 130
inter-annotation agreement 124
Intermediate ML
reference link 73
interquartile range (IQR) 213
intersection over union (IoU) 131, 377
Ishihara, Shotaro 412
isotonic regression 143
IterativeStratification
reference link 165
J
Jaccard index 131
Janson, Giuliano 184
K
Kaggle API 14
competitions types 17
reference link 14
Kaggle datasets
reference link 37
Kaggle Learn
reference link 73
Kaggle meetups
participating in 481
KaggleNoobs
reference link 29
Kaggle Notebooks 14, 27, 53, 66
Kaggle, online presence
Kaggler-ja
reference link 29
Kaggle Twitter profile
reference link 12
Kagoole
URL 466
KBinsDiscretizer
reference link 164
KDD Cup 5
KDnuggets 462
Keras built-in augmentations 341
ImageDataGenerator approach 341-344
preprocessing layers 345
KerasTuner 285
URL 285
Kernels 53
k-fold cross-validation 161-163
Knowledge Discovery and Data Mining
reference link 5
L
L1 norm 113
L2 norm 113
Larko, Dmitry 155
leaderboard
Leaf Classification
reference link 101
leakage
leave-one-out (LOO) 159
Lee, Jeong-Yoon 479
lexical diversity 402
LightGBM 253
reference link 212
references 253
linear models 250
logarithmic mean 312
loss function 99
Lukyanenko, Andrey 125
Lux AI game
reference link 441
M
Machine Learning Explainability
reference link 74
Masters competitions 18
Matthews correlation coefficient (MCC) 121, 122
mean absolute error (MAE) 109, 113
Mean Average Precision at K (MAP@{K}) 133-135
mean of powers 312
mean squared error (MSE) 109
Medium
URL 460
Medium publications
references 461
meta-features 213
reference link 102
meta-model 317
metrics, for classification
F1 score 119
log loss 119
Matthews correlation coefficient (MCC) 121, 122
metrics, for multi-class classification
macro averaging 123
Macro-F1 123
Mean-F1 124
micro averaging 123
Micro-F1 123
multiclass log loss (MeanColumnwiseLogLoss) 123
weighting 123
metrics, for regression 109
mean absolute error (MAE) 113
mean squared error (MSE) 109-111
root mean squared error (RMSE) 111
root mean squared log error (RMSLE) 112
metrics, multi-label classification and recommendation problems 133
MAP@{K} 134
metrics, object detection problems 129-131
intersection over union (IoU) 131
mind-reading illusion 92
mixup augmentation technique 229
MLFlow
URL 201
models
blending, meta-model used 317
model validation system
Mulla, Rob 306
multi-armed bandit (MAB) 435
multi-class classification
metrics 122
multi-head attention 288
multi-label classification and recommendation problems
metrics 133
multiple machine learning models, averaging
cross-validation strategy, averaging in 315, 316
predictions, averaging 312-314
ROC-AUC evaluation averaging, rectifying 316
N
natural language processing (NLP) 389
neptune.ai
reference link 201
nested cross-validation 168-170
Netiquette 92
neural networks
for tabular competitions 231-235
Neural Oblivious Decision Ensembles (NODE) 234
never-before-seen metrics
No Free Lunch theorem 5
non-linear transformations 110
Notebook
Accelerator 56
Environment 56
Internet 57
Language 56
leveraging, for your career 452
progression requirements 67, 68
Numerai 10
reference link 10
O
object detection
metrics 129
objective function 99
OCR error 421
one-hot encoding 211
Open Data Science Network
reference link 29
opportunities 35
Optuna
treating, as multi-class problem 101
treating, as regression problem 102
out-of-fold (OOF) predictions 143, 170, 323, 410
overfitting 157
P
pandas, Kaggle Learn course 73
performance tiers, Kaggle 32
Pipelines
feature 409
pixel accuracy 130
Plat’s scaling 143
Playground competitions 19
examples 19
poetry
reference link 464
portfolio
builiding, with Kaggle 447-449
discussions, leveraging 452-454
positive class 100
precision metric 117
precision/recall trade 117
reference link 118
predicted probability 141, 142
predictions
preprocessing layers 345
private test set 16
probabilistic evaluation methods 161
k-fold cross-validation 161-163
out-of-fold predictions (OOF), producing 170, 171
probability calibration
reference link 316
proxy function 261
public test set 15
Puget, Jean-François 235
Q
Quadratic Weighted Kappa 105
R
Rajkumar, Sudalai 144
random state
setting, for reproducibility 202, 203
Contributor 33
Expert 33
Grandmaster 33
Master 33
Novice 33
RAPIDS
reference link 206
recall metric 117
receiver operating characteristic curve (ROC curve) 120, 121
Recruitment competitions 18
examples 18
Rectified Adam 289
recurrent neural networks (RNNs) 287
regression task 100
metrics 109
regularization 417
reinforcement learning (RL) 100, 425, 426
reproducibility
random state, setting for 202, 203
Research competitions 18
ROC-AUC evaluation metric 313
averaging, rectifying 316
rock-paper-scissors 431
payoff matrix 432
reference link 431
root mean squared error (RMSE) 109-111
root mean squared log error (RMSLE) 112-114
run-length encoding (RLE) 371
S
sampling approaches, ensembling
bagging 308
pasting 308
random patches 308
random subspaces 308
sampling strategies 159
Santa competition 2020 435-439
reference link 435
reward decay 435
Scikit-multilearn
reference link 164
Scikit-optimize
scoring function 99
semantic segmentation 130, 371-384
shake-ups 151
shared words 403
ShuffleSplit
reference link 171
sigmoid method 143
Signate
reference link 10
simple arithmetic average 312
Simple format 21
simulated typo 421
size
Spearman correlation 399
splitting strategies
using 159
stability selection 221
Stochastic Weighted Averaging (SWA) 289
stratified k-fold
reference link 164
Streamlit
URL 465
sub-sampling 171
sum of squared errors (SSE) 109
sum of squares total (SST) 110
support-vector machine classification (SVC) 243
support-vector machines (SVMs) 243, 250, 251
swap noise 228
swapping 417
symmetric mean absolute percentage error (sMAPE)
reference link 114
synonym replacement 415
Synthetic Data Vault
reference link 197
systematic experimentation 153
T
Tabnet 234
tabular competitions
Tabular Playground Series 196-201
target encoding 216
task types
classification 100
ordinal 101
regression 100
Telstra Network Disruption
reference link 189
tensor processing units (TPU) 338
test set 168
predicting 170
text augmentation strategies 414
TF-IDF representations 406
Thakur, Abhishek 397
tokenization 392
TPE approach
train-test split 160
transformations, images 336
TransformedTargetRegressor function
reference link 111
Tree-structured Parzen Estimators (TPEs) 261, 262
Tunguz, Bojan 223
two-stage competition 21
U
univariate analysis 204
Universal Sentence Encoder
reference link 404
usability index 41
V
validation loss 157
validation set 168
predicting 170
version control system 465
W
Weighted Root Mean Squared Scaled Error
reference link 114
Weights and Biases
reference link 201
X
Y
Yolov5 357
Z
Zhang, Yirun 471
Zindi 10
reference link 10
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
3.133.140.153