Index

Symbols

632 bootstrap 172

A

accuracy 114

acquisition function 261, 295

adaptive overfitting 150

adversarial testing 25

adversarial validation 160, 180

example 181, 182

using 179-181

AI Ethics

reference link 74

Akiyama, Osamu 472, 473

albumentations 346, 347

characteristics 346

Alibaba Cloud 11

reference link 11

AlphaZero

reference link 431

Analytics competitions 19

Analytics Vidhya 11, 462

reference link 11

Annuals competitions 18

attention 288

AUC metric 105

augmentation techniques

for image 335

for text 414

auto-correlation 166

autoencoders

denoising, with 226-230

AutoGluon 330

AutoViz

reference link 204

average precision 118

average precision at k (AP@K) 134

averaging ensembling technique 307-309

B

bagging technique 305

basic optimization techniques 242

grid search 243-245

halving search 246, 247

random search 245, 246

basic techniques, text augmentation strategies

swapping 417-420

synonym replacement 415, 416

Bayesian optimization 261

customizing 268-276

extending, to neural architecture search 276-285

BayesSearchCV function 268

Becker, Dan 476

Bhatia, Ruchi 299-301

bias and variance 157, 158

bivariate analysis 204

blending 276, 317

advantages 317

best practices 318-323

disadvantages 318

Bober-Irizar, Mikel 474

bootstrap 171-173

Boruta 222

BorutaShap 222

bottleneck DAEs 227

bounding box 130, 357

C

calibration function 143

CatBoost 257

parameters 257, 258

reference link 213

URL 257

Chesler, Ryan 174

classification task

binary 100

metrics 114

multi-class problem 101

multi-label problem 101

tasks 100

CodaLab 11

reference link 11

Code competitions 22

coefficient of determination 110

Cohen Kappa score 124

column block 256

Common Objects in Context (COCO) 372

Common Task Framework (CTF) paradigm 23

Community competitions 19

competition types 17

Analytics competitions 19

Annuals 18

challenges, resolving 24-26

Code competitions 22

Community competitions 19

Featured 17

Getting Started competitions 18

Masters 18

Playground competitions 19

Recruitment competitions 18

Research 18

competitions page, Kaggle 12

code menu 14

data menu 14

overview menu 14

rules menu 14

complex stacking and blending solutions

creating 329, 330

computational resources 26

concept drift 184

conda

reference link 464

confusion matrix 115

connections

building, with competition data scientists 470, 471

Connect X 426

agents 427

board 427

reference link 426

submission, building 427-430

Conort, Xavier 331

CookieCutter

reference link 464

correlation matrix 314

cost function 99

criticisms, Kaggle competitions 33, 34

cross-entropy 119

cross-validation strategy

averaging 315, 316

cross_val_predict

reference link 171

CrowdANALYTIX 10

reference link 10

CTGAN 196

D

Danese, Alberto 259-261

data

gathering 42

size, reducing 208, 209

data augmentation 335

data leakage 187

data science competition platforms 4-6

Alibaba Cloud 11

Analytics Vidhya 11

CodaLab 11

CrowdANALYTIX 10

DrivenData 10

Kaggle competition platform 7

minor platforms 11

Numerai 10

Signate 10

Zindi 10

dataset

leveraging 455, 456

setting up 37-41

working with 48, 49

Data Version Control (DVC) 465

URL 201

deep neural networks (DNNs) 276

deep stack DAEs 228

denoising autoencoders (DAEs) 227, 229

bottleneck DAEs 227

deep stack DAEs 228

Deotte, Chris 338, 347

Detectron2 371

Dev.to 462

Dice coefficient 132

discussion forum, Kaggle

bookmarking 85

example discussion approaches 86-89

filtering 83

for competition page 84

Netiquette 92

working 79-81

DistilBERT 393

distributions of test data

handling 183, 184

distributions of training data

handling 183, 184

Doarakko

URL 466

Dockerfile 457

DrivenData 10

reference link 10

E

early stopping 354

EfficientNet B0 350

embeddings 404

empirical Bayesian approach 218

ensemble algorithms 304, 305

ensemble selection technique 320

ensembling 303, 304

averaging techniques 305

models, averaging 308, 309

strategies 305

training cases, sampling 308

error function 100

evaluation metric 98

custom metrics 136-139

custom objective functions 136-139

optimizing 135

exploratory data analysis (EDA) 20, 453

dimensionality reduction, with t-SNE 205-207

dimensionality reduction, with UMAP 205-207

significance 203-205

Extra Trees 252

F

F1 score 119

fast.ai

reference link 8

fastpages 462

F-beta score 119

Featured competitions 17

feature engineering, techniques

feature importance, using to evaluate work 220-222

features, deriving 211-213

meta-features, based on columns 213, 214

meta-features, based on rows 213, 214

target encoding 215-220

feature leakage 188

Fink, Laura 384

forking 57

FreeCodeCamp 462

G

Game AI

reference link 73

geometric mean 312

Getting Started competitions 18

examples 18

Gists 463

GitHub

Notebook, saving to 60- 62

Global Wheat Detection competition

reference link 357

Goal-Impact-Challenges-Finding method 484

Google Cloud Platform (GCP)

upgrading to 64-66

Google Colab 465

Kaggle datasets, using 49-51

reference link 49

Google Landmark Recognition 2020

reference link 18

Gonen, Firat 442

gradient boosting 305

gradient tree boosting 253

Gradio

URL 465

grid search 242-245

GroupKFold

reference link 166

H

Hacker Noon 461

Halite game 439, 440

board 440

reference link 439

halving search 246, 247

harmonic mean 312

Henze, Martin 70-72

Heroku app

URL 466

HistGradientBoosting 258, 259

hyperparameters 258

HuggingFace Spaces

URL 465

hyperband optimization 285

Hyperopt 296

I

image classification

problems, handling 349-356

ImageDataGenerator approach 341-344

imitation learning 441

independent and identically distributed 152

instance segmentation 130

inter-annotation agreement 124

Intermediate ML

reference link 73

interquartile range (IQR) 213

intersection over union (IoU) 131, 377

Ishihara, Shotaro 412

isotonic regression 143

IterativeStratification

reference link 165

J

Jaccard index 131

Janson, Giuliano 184

K

Kaggle API 14

competitions types 17

reference link 14

Kaggle datasets

legal caveats 51, 52

reference link 37

using, in Google Colab 49-51

Kaggle Days 30, 481

Kaggle Learn

courses 73-76

reference link 73

Kaggle meetups

participating in 481

KaggleNoobs

reference link 29

Kaggle Notebooks 14, 27, 53, 66

Kaggle, online presence

blogs 460-463

GitHub 463-465

publications 460-463

Kaggler-ja

reference link 29

Kaggle Twitter profile

reference link 12

Kagoole

URL 466

KBinsDiscretizer

reference link 164

KDD Cup 5

KDnuggets 462

Keras built-in augmentations 341

ImageDataGenerator approach 341-344

preprocessing layers 345

KerasTuner 285

models, creating 285-294

URL 285

Kernels 53

k-fold cross-validation 161-163

k-fold variations 164-168

Knowledge Discovery and Data Mining

reference link 5

L

L1 norm 113

L2 norm 113

Larko, Dmitry 155

leaderboard

snooping on 150-152

Leaf Classification

reference link 101

leakage

handling 187-190

leave-one-out (LOO) 159

Lee, Jeong-Yoon 479

lexical diversity 402

LightGBM 253

hyperparameters 254, 255

reference link 212

references 253

linear models 250

logarithmic mean 312

log loss 105, 119

loss function 99

Lukyanenko, Andrey 125

Lux AI game

reference link 441

M

Machine Learning Explainability

reference link 74

majority voting 309, 310

Maranhão, Andrew 43-47

Masters competitions 18

Matthews correlation coefficient (MCC) 121, 122

mean absolute error (MAE) 109, 113

Mean Average Precision at K (MAP@{K}) 133-135

mean of powers 312

mean squared error (MSE) 109

Medium

URL 460

Medium publications

references 461

meta-features 213

Meta Kaggle dataset 102-105

reference link 102

meta-model 317

metrics, for classification

accuracy 114-116

F1 score 119

log loss 119

Matthews correlation coefficient (MCC) 121, 122

precision metrics 116-118

recall metrics 116-118

ROC-AUC 120, 121

metrics, for multi-class classification

macro averaging 123

Macro-F1 123

Mean-F1 124

micro averaging 123

Micro-F1 123

multiclass log loss (MeanColumnwiseLogLoss) 123

weighting 123

metrics, for regression 109

mean absolute error (MAE) 113

mean squared error (MSE) 109-111

root mean squared error (RMSE) 111

root mean squared log error (RMSLE) 112

R squared 110, 111

metrics, multi-label classification and recommendation problems 133

MAP@{K} 134

metrics, object detection problems 129-131

Dice coefficient 132, 133

intersection over union (IoU) 131

mind-reading illusion 92

mixup augmentation technique 229

MLFlow

URL 201

models

blending, meta-model used 317

stacking 323-327

model validation system

tuning 176-178

Mulla, Rob 306

multi-armed bandit (MAB) 435

multi-class classification

metrics 122

multi-head attention 288

multi-label classification and recommendation problems

metrics 133

multiple machine learning models, averaging

cross-validation strategy, averaging in 315, 316

majority voting 309-312

predictions, averaging 312-314

ROC-AUC evaluation averaging, rectifying 316

weighted averages 314, 315

N

Nash equilibrium 432, 433

natural language processing (NLP) 389

neptune.ai

reference link 201

nested cross-validation 168-170

Netiquette 92

neural networks

for tabular competitions 231-235

Neural Oblivious Decision Ensembles (NODE) 234

never-before-seen metrics

handling 105-107

nlpaug 420-423

No Free Lunch theorem 5

non-linear transformations 110

Notebook

Accelerator 56

Environment 56

Internet 57

Language 56

leveraging, for your career 452

progression requirements 67, 68

running 58-60

saving, to GitHub 60-62

setting up 54-56

using 63, 64

Numerai 10

reference link 10

O

object detection

handling 357-370

metrics 129

objective function 99

OCR error 421

Olteanu, Andrada 74-76

one-hot encoding 211

Onodera, Kazuki 247-249

Open Data Science Network

reference link 29

open domain Q&A 398-411

opportunities 35

Optuna

TPE approach 295-299

ordinal task 100, 101

treating, as multi-class problem 101

treating, as regression problem 102

out-of-fold (OOF) predictions 143, 170, 323, 410

producing 170, 171

overfitting 157

P

pandas, Kaggle Learn course 73

Pandey, Parul 19-21

performance tiers, Kaggle 32

Pipelines

feature 409

pixel accuracy 130

Plat’s scaling 143

Playground competitions 19

examples 19

poetry

reference link 464

portfolio

builiding, with Kaggle 447-449

datasets, leveraging 455, 456

discussions, leveraging 452-454

notebooks, leveraging 452-454

positive class 100

precision metric 117

precision/recall trade 117

reference link 118

Preda, Gabriel 457-459

predicted probability 141, 142

adjustment 142-144

predictions

post-processing 139-141

preprocessing layers 345

private test set 16

probabilistic evaluation methods 161

k-fold cross-validation 161-163

k-fold variations 164-168

out-of-fold predictions (OOF), producing 170, 171

probability calibration

reference link 316

proxy function 261

pseudo-labeling 224-226

public test set 15

Puget, Jean-François 235

Q

Quadratic Weighted Kappa 105

R

Rajkumar, Sudalai 144

random forests 251, 305

random search 242, 245, 246

random state

setting, for reproducibility 202, 203

rankings 32, 33

Contributor 33

Expert 33

Grandmaster 33

Master 33

Novice 33

Rao, Rohan 108, 109

RAPIDS

reference link 206

recall metric 117

receiver operating characteristic curve (ROC curve) 120, 121

Recruitment competitions 18

examples 18

Rectified Adam 289

recurrent neural networks (RNNs) 287

regression task 100

metrics 109

regularization 417

reinforcement learning (RL) 100, 425, 426

reproducibility

random state, setting for 202, 203

Research competitions 18

ROC-AUC evaluation metric 313

averaging, rectifying 316

rock-paper-scissors 431

payoff matrix 432

reference link 431

using 432-434

root mean squared error (RMSE) 109-111

root mean squared log error (RMSLE) 112-114

run-length encoding (RLE) 371

converting, to COCO 372, 373

S

sampling approaches, ensembling

bagging 308

pasting 308

random patches 308

random subspaces 308

sampling strategies 159

Santa competition 2020 435-439

reference link 435

reward decay 435

Scikit-multilearn

reference link 164

Scikit-optimize

using 262-267

scoring function 99

semantic segmentation 130, 371-384

sentiment analysis 389-396

shake-ups 151

shared words 403

ShuffleSplit

reference link 171

sigmoid method 143

Signate

reference link 10

simple arithmetic average 312

Simple format 21

simulated typo 421

size

reducing, of data 208, 209

Spearman correlation 399

splitting strategies

using 159

stability selection 221

stacking 305, 323-327

variations 327-329

STAR approach 483, 484

Stochastic Weighted Averaging (SWA) 289

stratified k-fold

reference link 164

Streamlit

URL 465

sub-sampling 171

sum of squared errors (SSE) 109

sum of squares total (SST) 110

support-vector machine classification (SVC) 243

support-vector machines (SVMs) 243, 250, 251

surrogate function 261, 295

swap noise 228

swapping 417

symmetric mean absolute percentage error (sMAPE)

reference link 114

synonym replacement 415

Synthetic Data Vault

reference link 197

systematic experimentation 153

T

Tabnet 234

tabular competitions

neural networks 231-235

Tabular Playground Series 196-201

target encoding 216

task types

classification 100

ordinal 101

regression 100

teaming 28, 29

Telstra Network Disruption

reference link 189

tensor processing units (TPU) 338

test set 168

predicting 170

text augmentation strategies 414

basic techniques 415-420

nlpaug 420-423

TF-IDF representations 406

Thakur, Abhishek 397

Titericz, Gilberto 449-452

tokenization 392

TPE approach

in Optuna 295-299

train-test split 160

transformations, images 336

demonstrating 336-341

TransformedTargetRegressor function

reference link 111

Tree-structured Parzen Estimators (TPEs) 261, 262

t-SNE 205-207

Tunguz, Bojan 223

two-stage competition 21

U

UMAP 205-207

univariate analysis 204

Universal Sentence Encoder

reference link 404

usability index 41

V

validation 153, 154

validation loss 157

validation set 168

predicting 170

version control system 465

W

weighted average 312-314

Weighted Root Mean Squared Scaled Error

reference link 114

Weights and Biases

reference link 201

X

XGBoost 255, 256

parameters 256, 257

reference link 212, 255

Xie, Yifan 90, 91

Y

Yolov5 357

Z

Zhang, Yirun 471

Zindi 10

reference link 10

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

    https://packt.link/free-ebook/9781801817479

  2. Submit your proof of purchase
  3. That’s it! We’ll send your free PDF and other benefits to your email directly
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.140.153