Index
A
AdaBoosting
Akaike information criteria (AIC)
Anaconda-Navigator
Artificial neural networks (ANN)
B
Bagging models
Bag-of-words (BOW) approach
Batch prediction
Bayesian information criteria (BIC)
Bayesian optimization
Bias-variance tradeoff
Binomial distribution
Boosting algorithms
AdaBoosting
definition
final model
gradient boosting
See alsoGradient boosting
SVM
See alsoSupport avector machine (SVM
training dataset
C
CatBoost
Chi-square automatic interaction detection (CHAID)
Classification algorithm
accuracy assessment criteria
applications
credit risk
business context
business objective
dataset
income prediction on census data
k-nearest neighbor
logistic regression
Naïve Bayes
supervised classified algorithms
target variable and independent variables
tree-based algorithms
Cluster analysis
Confusion matrix
Cosine similarity
Cross-entropy
D
Data analysis
Data explosion
Data integrity
Data mining
Data quality
Data science
Data validity
Deep learning
Dual information distance (DID)
E, F
End-to-end model
business problem
data cleaning and preparation
data discovery phase
dataset
categorical variable treatment
common issues
duplicates
imbalance
missing values present
outliers
deployment of ML model
documentation
EDA phase
key business stakeholders
ML model building
ML model development
model refresh and maintenance
overfitting vs. underfitting problem
threshold, classification algorithms
train/test split of data
Euclidean distance
ExtraTreeClassifier
Extreme gradient boosting
G
Gini coefficient
Github link
Goldfeld-Quandt test
Gradient boosting
CatBoost
ensemble-based advanced algorithms
extreme gradient boosting
light gradient boosting
properties
Gradient descent
H, I
Hyperparameters
Hypothesis testing
J
Junk characters
Jupyter notebook
K
KD Tree Nearest Neighbor
Kernel SVM (KSVM)
k-fold cross-validation
k-nearest neighbor
Kolmogorov-Smirnov test
L
Lexicon normalization
Library-based cleaning
Light gradient boosting
M
Machine learning (ML)
banking, financial services, and insurance sector
cluster analysis
definition
languages, and tools
manufacturing industry
popularity
preparations
principal components
regression vs. classification problems
retail
semi-supervised algorithms
and software engineering
statistics/mathematics
bias–variance trade-off
binomial distribution
correlation and covariance
descriptive vs. inferential statistics
discrete vs. continuous variable
measures of central tendency
normal/Gaussian distribution
numeric vs. categorical data
parameter vs. statistics
Poisson’s distribution
population vs. sample
vector and matrix
steps and process
supervised learning
telecommunication
unsupervised learning algorithm
Manhattan distance
Mean absolute error (MAE)
Mean squared error (MSE)
Minibatch gradient descent
Multicollinearity
Multiple linear regression
Multivariate adaptive regression splines (MARS)
N
Non-linear regression analysis
Nonparametric model
Null hypothesis
O
one-hot encoding
Open Neural Network Exchange Format (ONNX)
Ordinary least-squares (OLS)
P, Q
Part-of-speech (POS) tagging
Poisson’s distribution
Predictive Model Markup languages (PMML)
Pseudo R square
p-value
Python 3.5
Python code
R
Radius Neighbor Classifier
Random forest
Real-time prediction system
Regression analysis
business solving and decision making
continuous variable
efficacy of regression problem
ensemble methods
feature selection
heteroscedasticity
linear regression
assumptions
benchmark model
correlation
definition
mathematical equation
ML equation
model coefficients
random error
target variable/endogenous variable
vector-space diagram
multicollinearity
multiple linear regression
See alsoMultiple linear regression
nonlinearities
non-linear regression analysis
outliers
simple linear regression
See alsoSimple linear regression
statistical algorithms
tree-based algorithms
ReLU activation function
S
Semi-supervised algorithms
Sentiment analysis
Sigmoid activation function
Simple linear regression
creation
for housing dataset
Softmax function
Spam filtering
Stochastic gradient descent
Stop-word removal
Structured data
Supervised algorithms
activation functions
ANN
classification model
deep learning
hyperparameters
image classification model
image data
agriculture
challenges
healthcare
insurance
management process
manufacturing sector
mathematical modeling
retail
security and monitoring
self-driving cars
social media platforms and online marketplaces
neural network training process
optimization function
structured and unstructured datasets
text analytics process
text data
bag-of-words approach
challenges
data cleaning
extraction and management
feature extraction
language translation
ML model
news categorization/document categorization
N-gram and language models
part-of-speech (POS) tagging
sentiment analysis
spam filtering
term-frequency and inverse-document frequency
text summarization
tokenization
word embeddings
Supervised classified algorithms
Supervised learning algorithm
Support vector machine (SVM)
cancer detection case study
kernel SVM
maximum-margin hyperplane
regression and classification problems
structured datasets
2-dimensional space
T
tanh activation function
Top-down greedy approach
Tree-based algorithms
building blocks
C4.5
CHAID
classification and regression tree
classification error
entropy and information gain
Gini coefficient
impure/heterogenous
Iterative Dichotomizer 3
logistic regression problem
MARS
pure/homogeneous
splitting
transportation mode dependency
U
Unstructured data
Unsupervised learning algorithm
V
Variance inflation factor (VIF)
W, X, Y, Z
Web service
Web service–based deployment process
Word embeddings
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
54.227.104.229