  1. Affinity analysis

  2. Aggregate() function

  3. Akaike information criterion (AIC) value

  4. Amazon

  5. Apache Hadoop ecosystem

  6. Apache Hadoop YARN

  7. Apache HBase

  8. Apache Hive

  9. Apache Mahout

  10. Apache Oozie

  11. Apache Pig

  12. Apache Spark

  13. Apache Storm

  14. Apply() function

  15. Arrays, R

  16. Artificial intelligence

  17. Association-rule analysis

    1. association rules

    2. if-then

    3. interpreting results

    4. market-basket analysis

    5. rules

    6. support

  18. Association rules/affinity analysis


  1. Bar plot

  2. Bayes theorem

  3. Bias-variance erros

  4. Big data

    1. analysis

    2. analytics, future trends

      1. addressing security and compliance

      2. artificial intelligence

      3. autonomous services for machine learning

      4. business users

      5. cloud

      6. data lakes

      7. growth of social media

      8. healthcare

      9. in-database analytics

      10. in-memory analytics

      11. Internet of Things

      12. migration of solutions

      13. prescriptive analytics

      14. real-time analytics

      15. vertical and horizontal applications

      16. visualization at business users

      17. whole data processing

    3. characteristics

    4. ecosystem

    5. use of

  5. Big data analytics

  6. Binomial distribution

  7. Bivariate data analysis

  8. Bootstrap aggregating/bagging

  9. Boxplots

  10. Business analytics

    1. applications of

      1. customer service and support areas

      2. human resources

      3. marketing and sales

      4. product design

      5. service design

    2. computer packages and applications

    3. consolidate data from various sources

    4. drivers for

    5. framework for

    6. infinite storage and computing capability

    7. life cycle of project

    8. programming tools and platforms

    9. required skills for business analyst

      1. data analysis techniques and algorithms

      2. data structures and storage/warehousing techniques

      3. programming knowledge

      4. statistical and mathematical concepts

  11. Business Analytics and Statistical Tools

  12. Business analytics process

    1. data collection and integration

      1. data warehouse

      2. HR and finance functions

      3. IT database

      4. manufacturing and production process

      5. metadata

      6. NoSQL databases

      7. operational database

      8. primary source

      9. sampling technique

      10. secondary source

      11. variable selection

    2. definition

    3. deployment

    4. functions

      1. collection and integration

      2. deployment

      3. evaluation

      4. exploration and visualization

      5. management and review report

      6. modeling techniques and algorithms

      7. preprocessing

      8. problem, objectives, and requirements

    5. historical data

    6. identifying and understanding problem

    7. life cycle

    8. management report and review

      1. data cleaning carried out

      2. data set use

      3. deployment and usage

      4. issues handling

      5. model creation

      6. prerequisites

      7. problem description

    9. model evaluation

      1. confusion matrix

      2. gain/lift charts

      3. holdout partition

      4. k-fold cross-validation

      5. ROC chart

      6. test data

      7. validation

    10. model evaluationt

      1. training

    11. preprocessing

SeePreprocessing data
  1. real-time data

  2. regression model

  3. root-mean-square error

  4. sequence of phases

  5. techniques and algorithms

    1. data types

    2. descriptive analytics

    3. machine learning

    4. predictive analytics


  1. Classification techniques

    1. decision tree

SeeDecision tree structure
  1. disadvantage

  2. k-nearest neighbor (K-NN)

  3. probabilistic models

    1. advantages and limitations

    2. bank credit-card approval process

    3. Naïve Bays

  4. R

    1. cross-validation error

    2. CSV format

    3. functions

    4. misclassification error

    5. plotting deviance vs. size

    6. school data set

    7. testing model

    8. training set and test set

    9. tree() package

  5. random forests

  6. step process

  7. types

  1. Cloud

  2. Cloudera

  3. Clustering analysis

    1. average linkage (average distance)

    2. categorical variable

    3. centroid distance

    4. complete linkage (maximum distance)

    5. Euclidean distance

    6. finance

    7. hierarchical clustering

      1. algorithm

      2. dendrograms

      3. limitations

    8. hierarchical method

    9. HR department

    10. Manhattan distance

    11. market segmentation

    12. measures distance (between clusters)

    13. mixed data types

    14. n records

    15. nonhierarchical clustering

SeeK-means algorithm
  1. nonhierarchical method

  2. overview

  3. pearson product correlation

  4. purpose of

  5. single linkage (minimum distance)

  1. Coefficient of determination

  2. Comma-Separated Values (CSV)

  3. Computations on data frames

    1. analyses

    2. EmpData data

    3. in R

    4. scatter plots

  4. Continuous data

  5. Control structures in R

    1. for loops

    2. if-else

    3. looping functions

      1. apply() function

      2. cut() function

      3. lapply() function

      4. sapply() function

      5. split() function

      6. tapply() function

    4. while loops

    5. writing functions

  6. Correlation

  7. Correlation coefficient

  8. Correlation graph

  9. Cross-Industry Standard Process for Data Mining (CRISP-DM)

  10. Cut() function

  11. Cutree() function


  1. Data

  2. Data aggregation

  3. Data analysis, R

    1. reading and writing data

      1. from Microsoft Excel file

      2. from text file

      3. from web

  4. Data analysis tools

  5. Data analytics

  6. Data exploration and visualization

    1. descriptive statistics

    2. goal of

    3. graphs

      1. box/whisker plot

      2. correlation

      3. density function

      4. histograms

      5. notched plots

      6. registered users vs. casual users

      7. scatter plot matrices

      8. scatter plots

      9. trellis plot

      10. types of

      11. univariate analysis

    4. normalization techniques

    5. phase

    6. tables

    7. transformation

    8. View() command

  7. Data frames, R

  8. Data lakes

  9. Data Mining Group (DMG)

  10. Data science

  11. Data structures

    1. in R

      1. arrays

      2. data frames

      3. factors

      4. lists

      5. matrices

  12. Decision tree structure

    1. bias and variance

    2. classification rules

    3. data tuples

    4. entropy/expected information

    5. generalization errors

    6. gini index

    7. impurity

    8. induction

    9. information gain

    10. overfitting and underfitting

    11. overfitting errors

      1. CART method

      2. pruning process

      3. regression trees

      4. tree growth

    12. recursive divide-and-conquer approach

    13. root node

  13. Deep learning

  14. Dendrograms

  15. Density function

  16. Descriptive analytics

    1. computations on dataframes

SeeComputations on data frames
  1. graphical

SeeGraphical description of data
  1. Maximum depth of river

  2. mean depth of the river

  3. median of the depth of river

  4. notice, sign board

  5. percentile

  6. population and sample

  7. probability

  8. quartile 3

  9. statistical parameters

SeeStatistical parameters
  1. Discrete data types

  2. Durbin-Watson test


  1. Economic globalization

  2. Ecosystem, big data

  3. Euclidean distance

  4. Extensible Markup Language (XML)


  1. Factors, R

  2. for loops


  1. Graphical description of data

    1. bar plot

    2. boxplot

    3. histogram

    4. plots in R

      1. code

      2. creation, simple plot

      3. plot()

      4. variants

  2. Gross domestic product (GDP)


  1. Hadoop Distributed File System (HDFS)

  2. Hadoop ecosystem

    1. advantages

  3. Hadoop framework

  4. Healthcare, big data

  5. Hierarchical clustering

    1. algorithm

    2. closeness

    3. dendrograms

    4. limitations

  6. Histograms

  7. Huge computing power

  8. Huge storage power

  9. Hybrid Transactional/Analytical Processing (HTAP)

  10. Hypothesis testing


  1. If-else structure

  2. In-database analytics

  3. In-memory analytics

  4. Integrated development environment (IDE)

  5. Internet of Things

  6. Interquartile Range (IQR) method

  7. Interval data types


  1. JavaScript Object Notation (JSON) files

  2. JobTracker


  1. k-fold cross-validation

  2. K-means algorithm

    1. case study

      1. outliers verification

      2. relevant variables

      3. scores() function

      4. standardized values

      5. test data set

    2. data points (observations)

      1. aggregate() function

      2. cutree() function

      3. data observations

      4. dendrogram

      5. dist() function

      6. hclust() function

      7. hierarchical partitioning approach

      8. library(NbClust) command

      9. NbClust() command

      10. observations

      11. plot() function

      12. rect.hclust() function

      13. rent and distances

      14. selected approaches

    3. goal

    4. k-means algorithm

    5. limitations

    6. objective of

    7. partition clustering methods

  3. kmeansruns() function

  4. k-nearest neighbor (K-NN)


  1. lapply() function

  2. Lasso Regression method

  3. Linear regression

    1. assumptions

    2. correlation

      1. attrition

      2. cause-and-effect relationship

      3. coefficient

      4. customer satisfaction

      5. employee satisfaction index

      6. sales quantum

      7. strong/weak association

    3. data frame creation

    4. degrees of freedom

    5. equal variance, variable

    6. equation

    7. F-statistic

    8. function

    9. independent and dependent variable

    10. innovativeness

    11. intercept

    12. least squares method

    13. linear relationship

    14. marketing efforts

    15. multiple R-squared

    16. predict() function

    17. profitability

    18. properties

    19. p-value

    20. quality-related statistics

    21. R command

    22. residuals

    23. residual standard error

    24. sales personnel

    25. standard error

    26. testing

      1. independence errors

      2. linearity

      3. normality

    27. validation

      1. crPlots(model name) function

      2. gvlma() function

      3. scale-location plot

    28. value of significance

    29. work environment

  4. Lists, R

  5. Logistic regression

    1. binomial distribution

    2. data creation

    3. glm() function

    4. lm() function

    5. logistic regression model

    6. model creation

      1. comparison

      2. conclusion

      3. deviance

      4. dispersion

      5. glm() function

      6. model fit verification

      7. multicollinearity

      8. residual deviance

      9. summary of

      10. variables

      11. warning message

    7. multinomial logistic regression

    8. read.csv() command

    9. regularization

  1. training and testing

    1. prediction() function

    2. response variable

    3. validation

  1. Looping functions

    1. apply() function

    2. cut() function

    3. lapply() function

    4. sapply() function

    5. split() function

    6. tapply() function


  1. Machine learning

  2. Manhattan distance

  3. MapReduce

  4. Market-basket analysis (MBA)

SeeAffinity analysis
  1. Matrices, R

  2. Measurable data

SeeQuantitative data
  1. Microsoft Azure

  2. Microsoft Business Intelligence and Tableau

  3. Microsoft Excel file, reading data

  4. Microsoft SQL Server database

  5. Minkowski distance

  6. Min-max normalization

  7. Mtcars Data Set

  8. Multicollinearity

  9. Multinomial logistic regression

  10. Multiple linear regression

    1. assumptions

    2. components

    3. correlation

    4. data

    5. data-frame format

    6. discrete variables

    7. equation

    8. lm() function

    9. multicollinearity

    10. predictors

    11. response variable

    12. R function glm()

    13. stepwise

    14. subsets approach

    15. training and testing model

    16. validation

      1. crPlots

      2. Durbin-Watson test

      3. ncvTest(model name)

      4. normal Q-Q plot

      5. qqPlot

      6. residuals vs. fitted

      7. residuals vs. leverage plot

      8. scale-location plot

      9. Shapiro-Wilk normality test

  11. multiple linear regression equation

SeeMultiple linear regression
  1. Multiple regression

  2. myFun() function


  1. Naïve Bays

  2. Natural language processing (NLP)

  3. NbClust() function

  4. Nominal data types

  5. Nonhierarchical clustering

SeeK-means algorithm
  1. Non-linear regression

  2. Normal distribution

  3. Normalization techniques

  4. NoSQL

  5. Null hypothesis


  1. Online analytical processing (OLAP)

  2. Open Database Connectivity (ODBC)

  3. Ordinal data types

  4. Overdispersion


  1. Packages and libraries, R

  2. Partition clustering methods

  3. Poisson distribution

  4. Prediction

  5. Predictive analytics

    1. classification

    2. regression

  6. Predictive Model Markup Language (PMML)

  7. Preprocessing data

    1. preparation

      1. duplicate, junk, and null characters

      2. empty values

      3. handling missing values

    2. R

      1. as.numeric() function

      2. complete.cases() function

      3. data types

      4. factor levels

      5. factor() type

      6. head() command

      7. methods

      8. missing values

      9. names() and c() function

      10. table() function

      11. vector operations

    3. types

  8. Probabilistic classification

    1. advantages and limitations

    2. bank credit-card approval process

    3. Naïve Bays

  9. Probability

    1. concepts

    2. distributions

SeeProbability distributions
  1. events

  2. mutually exclusive events

  3. mutually independent events

  4. mutually non-exclusive events

  1. Probability distributions

    1. binomial

    2. normal

    3. poisson

  2. Probability sampling

  3. Property graphs (PG)


  1. Qualitative data

  2. Quantitative data


  1. R

    1. advantages

    2. console

    3. control structures

      1. for loops

      2. if-else

      3. looping functions

      4. while loops

      5. writing functions

    4. data analysis

      1. reading and writing data

    5. data analysis tools

    6. data structures

      1. arrays

      2. data frames

      3. factors

      4. lists

      5. matrices

    7. glm() function

    8. installation

      1. RStudio interface

    9. interfaces

    10. library(NbClust) command

    11. lm() function

    12. Naïve Bays

    13. objects types

    14. packages and libraries

    15. pairs() command

    16. programming, basics

      1. assigning values

      2. creating vector

    17. View() command

  2. Random forests

  3. Random sampling

  4. Ratio data types

  5. read.csv() function

  6. read.table() function

  7. Receiver operating characteristic (ROC)

  8. rect.hclust() function

  9. Regularization

    1. model

    2. cv.glmnet() function

    3. generic format

    4. glmnet() function

    5. glmnet_fit command

    6. methods

    7. plot() function

    8. plot(

    9. predict() function

    10. print() function

    11. shrinkage methods

    12. variable

  10. Ridge Regression method

  11. RODBC package

  12. Root-mean-square error (RSME)

  13. RStudio

    1. installation error

    2. installing

    3. interface

    4. output

    5. window


  1. sapply() function

  2. Scatter plot matrices

  3. Scatter plots

    1. analysis of data

    2. changes, relationship

    3. Coding

    4. created in R

    5. EmpData1

  4. seq_along() function

  5. Shrinkage methods

  6. Simple regression

  7. split() function

  8. Standard deviations

  9. Statistical parameters

    1. mean

      1. data set

      2. downside of

      3. in R

      4. limitations

      5. profit and effective

      6. single parameter

      7. usage of

    2. median

    3. mode

    4. quantiles

    5. range

    6. standard deviation

    7. summary(dataset)

    8. variance

  10. Storm

  11. Stratified sampling

  12. Supervised machine learning

  13. Systematic sampling


  1. tapply() function

  2. Text file, reading data

  3. Transformation

  4. Trellis graphics


  1. Univariate analysis

  2. Unsupervised machine learning

    1. association-rule analysis

      1. association rules

      2. if-then

      3. interpreting results

      4. market-basket analysis

      5. rules

      6. support

    2. clustering

SeeClustering analysis


  1. Variance errors

  2. Variance inflation factor (VIF)

  3. Variety

  4. Velocity

  5. Visualization

    1. Workflow

  6. Visualization

SeeData exploration and visualization

W, X, Y

  1. Web, reading data

  2. while loops

  3. Whole data processing


  1. Z-score normalization

