Index
C
- Calinski-Harabasz Index, K-Means
- CART (classification and regression trees) algorithm, Decision Tree
- categorical encoding, Other Categorical Encoding
- categories, pulling from strings, Pulling Categories from Strings-Pulling Categories from Strings
- class balance, Class Balance
- class prediction error, Class Prediction Error
- classification
- algorithm families for, Various Families
- asking a question to create predictive model for, Ask a Question
- baseline model, Baseline Model
- cleaning data, Clean Data-Clean Data
- confusion matrix and, Confusion Matrix
- decision tree, Decision Tree-Decision Tree
- evaluation (see classification evaluation)
- feature creation, Create Features-Create Features
- gathering data, Gather Data
- gradient boosted with LightGBM, Gradient Boosted with LightGBM-Gradient Boosted with LightGBM
- imbalanced classes (see imbalanced classes)
- imports, Imports
- imputing data, Impute Data
- k-nearest neighbor, K-Nearest Neighbor-K-Nearest Neighbor
- learning curve, Learning Curve
- logistic regression, Logistic Regression-Logistic Regression
- model creation, Create Model
- model deployment, Deploy Model
- model evaluation, Evaluate Model
- model optimization, Optimize Model
- models, Classification-TPOT
- Naive Bayes classifier, Naive Bayes-Naive Bayes
- normalizing data, Normalize Data
- pipelines for, Classification Pipeline-Classification Pipeline
- project layout suggestion, Project Layout Suggestion
- random forest, Random Forest-Random Forest
- refactoring code, Refactor
- ROC curve, ROC Curve
- sampling data, Sample Data
- SHAP and, Shapley
- stacking, Stacking
- support vector machine, Support Vector Machine-Support Vector Machine
- terms for data, Terms for Data
- TPOT, TPOT-TPOT
- walkthrough with Titanic dataset, Classification Walkthrough: Titanic Dataset-Deploy Model
- XGBoost, XGBoost-XGBoost
- classification and regression trees (CART) algorithm, Decision Tree
- classification evaluation, Metrics and Classification Evaluation-Discrimination Threshold
- accuracy, Accuracy
- class balance, Class Balance
- class prediction error, Class Prediction Error
- classification report, Classification Report
- confusion matrix, Confusion Matrix-Confusion Matrix
- cumulative gains plot, Cumulative Gains Plot-Cumulative Gains Plot
- discrimination threshold, Discrimination Threshold
- F1, F1
- lift curve, Lift Curve
- metrics, Metrics
- precision, Precision
- precision-recall curve, Precision-Recall Curve
- recall, Recall
- ROC, ROC
- classification report, Classification Report
- cleaning data, Clean Data-Clean Data, Cleaning Data-Replacing Missing Values
- clustering, Clustering-Understanding Clusters
- code, refactoring, Refactor
- coefficient of determination, Baseline Model, Metrics
- collinear columns, Collinear Columns
- columns
- col_na feature, Add col_na Feature
- conda
- confusion matrix, Confusion Matrix, Confusion Matrix-Confusion Matrix
- cookiecutter, Project Layout Suggestion
- correlation, in exploratory data analysis, Correlation-Correlation
- Cross-Industry Standard Process for Data Mining (CRISP-DM), Overview of the Machine Learning Process
- CSV files, Clean Data
- cumulative gains plot, Cumulative Gains Plot-Cumulative Gains Plot, Lift Curve
- cumulative plot, PCA
- curse of dimensionality, Feature Selection
D
- data
- date feature engineering, Date Feature Engineering
- Davis-Bouldin Index, K-Means
- decision tree, Decision Tree-Decision Tree
- dendrograms, Examining Missing Data, Agglomerative (Hierarchical) Clustering-Agglomerative (Hierarchical) Clustering
- dependence plots, Shapley, Shapley
- dimensionality reduction, Dimensionality Reduction-PHATE
- discrimination threshold, Discrimination Threshold
- downsampling, Downsampling Majority
- drop column importance, Random Forest
- dtreeviz, Decision Tree, Decision Tree
- dummy variables, Dummy Variables
E
- elbow method, PCA
- ensemble methods, Tree-based Algorithms and Ensembles
- evaluation tools (see classification evaluation)
- explained variance, Metrics
- exploratory data analysis, Exploring-Parallel Coordinates
- box and violin plots, Box and Violin Plots
- comparing two ordinal values, Comparing Two Ordinal Values
- correlation, Correlation-Correlation
- data size, Data Size
- histograms for, Histogram
- joint plot for, Joint Plot
- pair grid for, Pair Grid
- parallel coordinates plot, Parallel Coordinates
- RadViz plot, RadViz
- scatter plot for, Scatter Plot
- summary statistics, Summary Stats
F
- F1, F1
- false negatives, Confusion Matrix
- false positives, Confusion Matrix, ROC
- fancyimpute, Impute Data, Imputing Data
- fastai, Date Feature Engineering
- fastcluster, Agglomerative (Hierarchical) Clustering
- feature
- feature engineering
- feature importance
- decision trees, Decision Tree
- feature selection, Feature Importance
- LightGBM, Gradient Boosted with LightGBM, LightGBM Regression
- model interpretation, Feature Importance
- partial dependence plots, Partial Dependence Plots-Partial Dependence Plots
- random forests, Random Forest-Random Forest
- tree-based models, Evaluate Model
- xgbfir package, XGBoost-XGBoost
- XGBoost, XGBoost, XGBoost Regression
- feature selection, Feature Selection-Feature Importance
- frequency encoding, Frequency Encoding
J
- joint plot, exploratory data analysis with, Joint Plot
- Jupyter
L
- label encoding, Label Encoder
- Laplace smoothing, Naive Bayes
- lasso regression, Lasso Regression
- leaky features
- learning curve, Learning Curve, Learning Curve-Learning Curve
- libraries
- lift (term), Lift Curve
- lift curve, Lift Curve
- LightGBM
- linear regression, Linear Regression-Linear Regression
- Linux, library installation on, Installation with Pip
- loading plot, PCA
- Local Interpretable Model-Agnostic Explanations (LIME), LIME-LIME
- logistic regression, Logistic Regression-Logistic Regression
M
- machine learning, overview of process, Overview of the Machine Learning Process
- Macintosh, library installation on, Installation with Pip
- majority classes, Downsampling Majority
- manifold learning (see Uniform Manifold Approximation and Projection (UMAP))
- manual feature engineering, Manual Feature Engineering
- matplotlib
- interactive scatter plots, PCA-PCA
- t-SNE visualization, t-SNE
- mean absolute error, Metrics
- mean squared logarithmic error, Metrics
- metrics
- minority class
- missing data, Missing Data-Adding Indicator Columns
- missingno
- model
- model explanation/interpretation, Explaining Models-Shapley
- model selection, Model Selection-Learning Curve
- multicollinearity, Collinear Columns, Linear Regression
- multivariate data, Parallel Coordinates
- mutual information, Mutual Information
P
- pair grid, Pair Grid
- pairwise comparisons, Correlation-Correlation
- pandas
- classification calculations, Confusion Matrix
- column names, Column Names
- data standardization, Standardize
- DataFrame column correlation, Correlation-Correlation
- determining data size, Data Size
- dropping rows with missing data, Dropping Missing Data
- dummy variable creation, Dummy Variables
- feature examination in clusters, Understanding Clusters
- for indicator columns, Adding Indicator Columns
- for missing data bar plot, Examining Missing Data
- frequency encoding, Frequency Encoding
- histograms with, Histogram
- iloc attribute, Summary Stats
- imports with, Imports
- imputing missing values with, Imputing Data
- int64 vs. Int64 types, Clean Data
- label encoding, Label Encoder
- manual feature engineering, Manual Feature Engineering
- ordinal category comparison, Comparing Two Ordinal Values
- parallel coordinates plot, Parallel Coordinates
- profile report with, Clean Data
- RadViz plots, RadViz
- scaling data to range, Scale to Range
- scatter plot generation, Scatter Plot
- summary stats, Summary Stats
- updating columns, Column Names
- parallel coordinates plot, Parallel Coordinates
- partial dependence plots, Partial Dependence Plots-Partial Dependence Plots
- PCA (see principal component analysis)
- Pearson correlation, Scatter Plot, Correlation
- permutation importance, Random Forest
- PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding), PHATE-PHATE
- pip
- pipelines, Pipelines-PCA Pipeline
- Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE), PHATE-PHATE
- precision
- precision-recall curve, Precision-Recall Curve
- prediction error plot, Prediction Error Plot
- preprocessing data, Normalize Data, Preprocess Data-Manual Feature Engineering
- and categorical_encoding library, Other Categorical Encoding
- col_na feature, Add col_na Feature
- date feature engineering, Date Feature Engineering
- dummy variables, Dummy Variables
- frequency encoding, Frequency Encoding
- label encoding, Label Encoder
- manual feature engineering, Manual Feature Engineering
- pulling categories from strings, Pulling Categories from Strings-Pulling Categories from Strings
- scaling to range, Scale to Range
- standardizing, Clean Data, Normalize Data, Standardize
- various categorical encoding approaches, Other Categorical Encoding
- principal component analysis (PCA), Principal Component Analysis
- probability plot, Normal Residuals
- pyjanitor, Create Features
R
- RadViz plot, RadViz
- random forest, Random Forest-Random Forest
- recall (sensitivity)
- receiver operating characteristic (ROC) curve, ROC Curve, ROC
- recursive feature elimination, Recursive Feature Elimination
- refactoring, Refactor
- regression, Regression-LightGBM Regression
- baseline model, Baseline Model
- decision tree, Decision Tree-Decision Tree
- k-nearest neighbor, K-Nearest Neighbor-K-Nearest Neighbor
- LightGBM for, LightGBM Regression-LightGBM Regression
- linear, Linear Regression-Linear Regression
- metrics, Metrics-Metrics
- pipelines for, Regression Pipeline
- random forest, Random Forest-Random Forest
- SHAP and, Shapley
- SVMs and, SVMs-SVMs
- XGBoost for, XGBoost Regression-XGBoost Regression
- regression coefficients, Regression Coefficients
- regression evaluation, Metrics and Regression Evaluation-Prediction Error Plot
- regression models, Explaining Regression Models-Shapley
- regular expressions, Pulling Categories from Strings
- residuals plot, Residuals Plot, Normal Residuals
- rfpimp, Collinear Columns
- ROC (receiver operating characteristic) curve, ROC Curve, ROC
- root mean squared error, Metrics
S
- sample (term), Terms for Data
- sampling data, Sample Data
- sandbox environment, for library installation, Installation with Pip
- scaling data to range, Scale to Range
- scatter plot
- scikit-learn
- categorical encoding, Other Categorical Encoding
- class_weight parameter, Penalize Models
- clustering metrics, K-Means
- clustering models, Agglomerative (Hierarchical) Clustering
- feature_importances_ attribute, Feature Importance
- imports with, Imports
- numeric features with, Clean Data
- PCA implementation, PCA
- pipelines, Pipelines-PCA Pipeline
- recursive feature elimination, Recursive Feature Elimination
- scipy, Normal Residuals, Agglomerative (Hierarchical) Clustering
- scprep, PCA
- scree plot, PCA
- seaborn
- sensitivity (see recall)
- SHapley Additive exPlanations (SHAP), Shapley-Shapley, Explaining Regression Models-Shapley
- silhouette coefficient, K-Means
- simple linear regression, Linear Regression-Linear Regression
- sklearn
- classification metrics implementation, Metrics
- coefficient of determination, Baseline Model
- data format for, Classification
- data standardization, Standardize
- DataFrame from confusion_matrix function, Confusion Matrix
- downsampling majority classes, Downsampling Majority
- for confusion matrix, Confusion Matrix
- Laplace smoothing with, Naive Bayes
- methods implemented by type models, Classification
- model optimization, Optimize Model
- mutual information determination, Mutual Information
- Naive Bayes classes, Naive Bayes
- regression model evaluation, Metrics-Metrics
- scaling data to range, Scale to Range
- SVM implementations in, Support Vector Machine
- tree interpretation, Tree Interpretation
- upsampling minority class, Upsampling Minority
- SME (see subject matter expert)
- splits, Gradient Boosted with LightGBM
- stacking classifier, Stacking
- standardizing data, Clean Data, Normalize Data, Standardize
- star imports, avoiding, Imports
- stratified sampling, Class Balance
- strings, pulling categories from, Pulling Categories from Strings-Pulling Categories from Strings
- subject matter expert (SME)
- summary statistics, Summary Stats
- supervised learning
- support vector machines (SVMs), Support Vector Machine-Support Vector Machine, SVMs-SVMs
- surrogate models, Surrogate Models, Understanding Clusters
- Synthetic Minority Over-sampling Technique (SMOTE), Generate Minority Data
T
- t-Distributed Stochastic Neighboring Embedding (t-SNE), t-SNE-t-SNE
- Titanic dataset, classification walkthrough with, Classification Walkthrough: Titanic Dataset-Deploy Model
- TPOT, TPOT-TPOT
- training data, Learning Curve
- transductive algorithms, Imputing Data
- tree interpretation, Tree Interpretation
- tree-based algorithms, Tree-based Algorithms and Ensembles
- true positives, ROC
- type 1/type 2 errors, Confusion Matrix
- types, for storage of columns of data, Clean Data
Y
- Yellowbrick
- class prediction error plot, Class Prediction Error
- class size bar plot, Class Balance
- classification report, Classification Report
- coefficient visualization, Logistic Regression, Linear Regression
- confusion matrix, Confusion Matrix, Confusion Matrix
- correlation heat map, Collinear Columns
- discrimination threshold visualization, Discrimination Threshold
- feature importance for XGBoost, XGBoost
- feature importance visualization, Decision Tree, XGBoost Regression
- imports with, Imports
- learning curve plot, Learning Curve
- pairwise comparisons, Correlation-Correlation
- parallel coordinates plot, Parallel Coordinates
- prediction error plot, Prediction Error Plot
- RadViz plot, RadViz
- residuals plot, Residuals Plot
- ROC curve, ROC
- scatter plot, Joint Plot
- scatter plot for 3D PCA, PCA
- silhouette score visualizer, K-Means
- validation curve report, Validation Curve-Validation Curve
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.