BAESENS, BART. (2014). Analytics in a Big Data World, The Essential Guide to Data Science and Its Applications. Wiley India Pvt. Ltd.
MAYER-SCHONBERGER, VIKTOR & CUKIER KENNETH. (2013). Big Data, A Revolution That Will Transform How We Live, Work and Think. John Murray (Publishers), Great Britain.
LINDSTROM, MARTIN. (2016). Small Data – The Tiny Clues That Uncover Huge Trends. Hodder & Stoughton, Great Britain.
FREEDMAN, DAVID; PISANI, ROBERT & PURVES, ROGER. (2013). Statistics. Viva Books Private Limited, New Delhi.
LEVINE, DAVID.M. (2011). Statistics for SIX SIGMA Green Belts. Dorling Kindersley (India) Pvt. Ltd., Noida, India.
DONNELLY, JR. ROBERT.A. (2007). The Complete Idiot’s Guide to Statistics, 2/e. Penguin Group (USA) Inc., New York 10014, USA.
TEETOR, PAUL. (2014). R Cookbook. Shroff Publishers and Distributors Pvt. Ltd., Navi Mumbai.
WITTEN, IAN.H.; FRANK, EIBE & HALL, MARK.A. (2014). Data Mining, 3/e – Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, Burlington, MA 01803, USA.
HARRINGTON, PETER. (2015). Machine Learning in Action. Dreamtech Press, New Delhi.
ZUMEL, NINA & MOUNT, JOHN. (2014). Practical Data Science with R. Dreamtech Press, New Delhi.
KABACOFF, ROBERT.I. (2015). R In Action – Data analysis and graphics with R. Dreamtech Press, New Delhi.
[Online] www.quora.com .
[Online] www.r-bloggers.com .
[Online] www.stackexchange.com .
[Online] https://cran.r-project.org/ .
[Online] www.r-project.org/ .
COMPUTERWORLD FROM IDG. (2016). 8 big trends in big data analysis. [Online] Available from: http://www.computerworld.com/article/2690856/big-data/8-big-trends-in-big-data-analytics.html
WELLESLEY INFORMATION SERVICES, MA 02026, USA. (2016). Big Data Analytics Predictions for 2016. Available from: http://data-informed.com/big-data-analytics-predictions-2016/
COMPUTERWORLD FROM IDG. (2016). 11 Market Trends in Advanced Analytics. [Online] Available from: http://www.computerworld.com/article/2489750/it-management/11-market-trends-in-advanced-analytics.html#tk.drr_mlt
WELLESLEY INFORMATION SERVICES, MA 02026, USA. (2016). 5 Big Trends to Watch in 2016. [Online] Available from: http://data-informed.com/5-big-data-trends-watch-2016/ .
ZHANG, NANCY.R. Ridge Regression, LARS, Logistic Regression. [Online] Available from: http://statweb.stanford.edu/~nzhang/203_web/lecture12_2010.pdf
QIAN, JUNYANG & HASTIE, TRAVOR. (2014). Glmnet Vignette. [Online] Available from: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
USUELLI, MICHELE. (2014). R Machine Learning Essentials. Packt Publishing.
BALI, RAGHAV & SARKAR, DIPANJAN. (2016). R Machine Learning By Example. Packt Publishing.
DAVID, CHIU & YU-WEI. (2015). Machine Learning with R Cookbook. Packt Publishing.
LANTZ, BRETT. (2015). Machine Learning with R, 2/e. Packt Publishing.
Data Mining - Concepts and Techniques By Jiawei Han, Micheline Kamber and Jian Pei, 3e, Morgan Kaufmann
S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB’96
D. Agrawal, A. E. Abbadi, A. Singh, and T. Yurek. Efficient view maintenance in data warehouses. SIGMOD’97
R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. ICDE’97
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26:65-74, 1997
E. F. Codd, S. B. Codd, and C. T. Salley. Beyond decision support. Computer World, 27, July 1993.
J. Gray, et al. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.
Swift, Ronald S. (2001) Accelerating Customer Relationships Using CRM and Relationship Technologies, Prentice Hall
Berry, M. J. A., Linoff, G. S. (2004) Data Mining Techniques. Wiley Publishing.
Ertek, G. Visual Data Mining with Pareto Squares for Customer Relationship Management (CRM) (working paper, Sabancı University, Istanbul, Turkey)
Ertek, G., Demiriz, A. A framework for visualizing association mining results (accepted for LNCS)
Hughes, A. M. Quick profits with RFM analysis. http://www.dbmarketing.com/articles/Art149.htm
Kumar, V., Reinartz, W. J. (2006) Customer Relationship Management, A Databased Approach. John Wiley & Sons Inc.
Spence, R. (2001) Information Visualization. ACM Press.
Dyche, Jill, The CRM Guide to Customer Relationship Management, Addison-Wesley, Boston, 2002.
Gordon, Ian. “Best Practices: Customer Relationship Management” Ivey Business Journal Online, 2002, pp. 1-6.
Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner [Hardcover] By Galit Shmueli (Author), Nitin R. Patel (Author), Peter C. Bruce (Author)
A. Gupta and I. S. Mumick. Materialized Views: Techniques, Implementations, and Applications. MIT Press, 1999.
J. Han. Towards on-line analytical mining in large databases. ACM SIGMOD Record, 27:97-107, 1998.
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD’96
C. Imhoff, N. Galemmo, and J. G. Geiger. Mastering Data Warehouse Design: Relational and Dimensional Techniques. John Wiley, 2003
W. H. Inmon. Building the Data Warehouse. John Wiley, 1996
R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2ed. John Wiley, 2002
P. O’Neil and D. Quass. Improved query performance with variant indexes. SIGMOD'97
Microsoft. OLEDB for OLAP programmer’s reference version 1.0. In http://www.microsoft.com/data/oledb/olap , 1998
A. Shoshani. OLAP and statistical databases: Similarities and differences. PODS’00.
S. Sarawagi and M. Stonebraker. Efficient organization of large multidimensional arrays. ICDE'94
OLAP council. MDAPI specification version 2.0. In http://www.olapcouncil.org/research/apily.htm , 1998
E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems. John Wiley, 1997
P. Valduriez. Join indices. ACM Trans. Database Systems, 12:218-246, 1987.
J. Widom. Research problems in data warehousing. CIKM’95.
Kurt Thearling. Data Mining. http://www.thearling.com , [email protected]
“Building Data Mining Applications for CRM”, By Alex Berson, Stephen Smith and Kurt Thearling
Building Data Mining Applications for CRM by Alex Berson, Stephen Smith, Kurt Thearling (McGraw Hill, 2000).
Introduction to Data Mining, By Pang-Ning, Michael Steinbach, Vipin Kumar, 2006 Pearson Addison-Wesley.
Data Mining: Concepts and Techniques, Jiawei Han and Micheline Kamber, 2000 (c) Morgan Kaufmann Publishers
Data Mining In Excel, Galit Shmueli Nitin R. Patel Peter C. Bruce, 2005 Galit Shmueli, Nitin R. Patel, Peter C. Bruce
Principles of Data Mining by David Hand, Heikki Mannila and Padhraic Smyth ISBN: 026208290x The MIT Press © 2001 (546 pages)
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
http://paginas.fe.up.pt/~ec/files_1011/week%2008%20-%20Decision%20Trees.pdf
Creswell, J. W. (2013). Research design: Qualitative, quantitative, and mixed methods approaches. Sage Publications, Incorporated.
Advance Data Mining Techniques, Olson, D.L, Delen, D, 2008 Springer
Phyu, Nu Thair, “Survey of Classification Techniques in Data Mining”, Proceedings of the International MultiConference of Engineers and Computer Scientists 2009 Vol I IMECS 2009, March 18 - 20, 2009, Hong Kong
Myatt, J. Glenn, “Making Sense of Data – A practical Guide to Exploratory Data Analysis and Data Mining”, 2007, WILEY-INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION
Fawcett, Tom, “An Introduction to ROC analysis”, Pattern Recognition Letters 27 (2006) 861–874
Sayad, Saeed. “An Introduction to Data Mining”, Self-Help Publishers (January 5, 2011).
Delmater, Rhonda, and Monte Hancock. "Data mining explained." (2001).
Alper, Theodore M. "A classification of all order-preserving homeomorphism groups of the reals that satisfy finite uniqueness." Journal of mathematical psychology 31.2 (1987): 135-154.
Narens, Louis. "Abstract measurement theory." (1985).
Luce, R. Duncan, and John W. Tukey. "Simultaneous conjoint measurement: A new type of fundamental measurement." Journal of mathematical psychology 1.1 (1964): 1-27.
Provost, Foster J., Tom Fawcett, and Ron Kohavi. "The case against accuracy estimation for comparing induction algorithms." ICML. Vol. 98. 1998.
Hanley, James A., and Barbara J. McNeil. "The meaning and use of the area under a receiver operating characteristic (ROC) curve." Radiology 143.1 (1982): 29-36.
Ducker, Sophie Charlotte, W. T. Williams, and G. N. Lance. "Numerical classification of the Pacific forms of Chlorodesmis (Chlorophyta)." Australian Journal of Botany 13.3 (1965): 489-499.
Kaufman, Leonard, and Peter J. Rousseeuw. "Partitioning around medoids (program pam)." Finding groups in data: an introduction to cluster analysis(1990): 68-125.
Affinity analysis
Aggregate() function
Akaike information criterion (AIC) value
Amazon
Apache Hadoop ecosystem
Apache Hadoop YARN
Apache HBase
Apache Hive
Apache Mahout
Apache Oozie
Apache Pig
Apache Spark
Apache Storm
Apply() function
Arrays, R
Artificial intelligence
Association-rule analysis
association rules
if-then
interpreting results
market-basket analysis
rules
support
Association rules/affinity analysis
Bar plot
Bayes theorem
Bias-variance erros
Big data
analysis
analytics, future trends
addressing security and compliance
artificial intelligence
autonomous services for machine learning
business users
cloud
data lakes
growth of social media
healthcare
in-database analytics
in-memory analytics
Internet of Things
migration of solutions
prescriptive analytics
real-time analytics
vertical and horizontal applications
visualization at business users
whole data processing
characteristics
ecosystem
use of
Big data analytics
Binomial distribution
Bivariate data analysis
Bootstrap aggregating/bagging
Boxplots
Business analytics
applications of
customer service and support areas
human resources
marketing and sales
product design
service design
computer packages and applications
consolidate data from various sources
drivers for
framework for
infinite storage and computing capability
life cycle of project
programming tools and platforms
required skills for business analyst
data analysis techniques and algorithms
data structures and storage/warehousing techniques
programming knowledge
statistical and mathematical concepts
Business Analytics and Statistical Tools
Business analytics process
data collection and integration
data warehouse
HR and finance functions
IT database
manufacturing and production process
metadata
NoSQL databases
operational database
primary source
sampling technique
secondary source
variable selection
definition
deployment
functions
collection and integration
deployment
evaluation
exploration and visualization
management and review report
modeling techniques and algorithms
preprocessing
problem, objectives, and requirements
historical data
identifying and understanding problem
life cycle
management report and review
data cleaning carried out
data set use
deployment and usage
issues handling
model creation
prerequisites
problem description
model evaluation
confusion matrix
gain/lift charts
holdout partition
k-fold cross-validation
ROC chart
test data
validation
model evaluationt
training
preprocessing
real-time data
regression model
root-mean-square error
sequence of phases
techniques and algorithms
data types
descriptive analytics
machine learning
predictive analytics
Classification techniques
decision tree
disadvantage
k-nearest neighbor (K-NN)
probabilistic models
advantages and limitations
bank credit-card approval process
Naïve Bays
R
cross-validation error
CSV format
functions
misclassification error
plotting deviance vs. size
school data set
testing model
training set and test set
tree() package
random forests
step process
types
Cloud
Cloudera
Clustering analysis
average linkage (average distance)
categorical variable
centroid distance
complete linkage (maximum distance)
Euclidean distance
finance
hierarchical clustering
algorithm
dendrograms
limitations
hierarchical method
HR department
Manhattan distance
market segmentation
measures distance (between clusters)
mixed data types
n records
nonhierarchical clustering
nonhierarchical method
overview
pearson product correlation
purpose of
single linkage (minimum distance)
Coefficient of determination
Comma-Separated Values (CSV)
Computations on data frames
analyses
EmpData data
in R
scatter plots
Continuous data
Control structures in R
for loops
if-else
looping functions
apply() function
cut() function
lapply() function
sapply() function
split() function
tapply() function
while loops
writing functions
Correlation
Correlation coefficient
Correlation graph
Cross-Industry Standard Process for Data Mining (CRISP-DM)
Cut() function
Cutree() function
Data
Data aggregation
Data analysis, R
reading and writing data
from Microsoft Excel file
from text file
from web
Data analysis tools
Data analytics
Data exploration and visualization
descriptive statistics
goal of
graphs
box/whisker plot
correlation
density function
histograms
notched plots
registered users vs. casual users
scatter plot matrices
scatter plots
trellis plot
types of
univariate analysis
normalization techniques
phase
tables
transformation
View() command
Data frames, R
Data lakes
Data Mining Group (DMG)
Data science
Data structures
in R
arrays
data frames
factors
lists
matrices
Decision tree structure
bias and variance
classification rules
data tuples
entropy/expected information
generalization errors
gini index
impurity
induction
information gain
overfitting and underfitting
overfitting errors
CART method
pruning process
regression trees
tree growth
recursive divide-and-conquer approach
root node
Deep learning
Dendrograms
Density function
Descriptive analytics
computations on dataframes
graphical
Maximum depth of river
mean depth of the river
median of the depth of river
notice, sign board
percentile
population and sample
probability
quartile 3
statistical parameters
Discrete data types
Durbin-Watson test
Economic globalization
Ecosystem, big data
Euclidean distance
Extensible Markup Language (XML)
Factors, R
for loops
Graphical description of data
bar plot
boxplot
histogram
plots in R
code
creation, simple plot
plot()
variants
Gross domestic product (GDP)
Hadoop Distributed File System (HDFS)
Hadoop ecosystem
advantages
Hadoop framework
Healthcare, big data
Hierarchical clustering
algorithm
closeness
dendrograms
limitations
Histograms
Huge computing power
Huge storage power
Hybrid Transactional/Analytical Processing (HTAP)
Hypothesis testing
If-else structure
In-database analytics
In-memory analytics
Integrated development environment (IDE)
Internet of Things
Interquartile Range (IQR) method
Interval data types
JavaScript Object Notation (JSON) files
JobTracker
k-fold cross-validation
K-means algorithm
case study
outliers verification
relevant variables
scores() function
standardized values
test data set
data points (observations)
aggregate() function
cutree() function
data observations
dendrogram
dist() function
hclust() function
hierarchical partitioning approach
library(NbClust) command
NbClust() command
observations
plot() function
rect.hclust() function
rent and distances
selected approaches
goal
k-means algorithm
limitations
objective of
partition clustering methods
kmeansruns() function
k-nearest neighbor (K-NN)
lapply() function
Lasso Regression method
Linear regression
assumptions
correlation
attrition
cause-and-effect relationship
coefficient
customer satisfaction
employee satisfaction index
sales quantum
strong/weak association
data frame creation
degrees of freedom
equal variance, variable
equation
F-statistic
function
independent and dependent variable
innovativeness
intercept
least squares method
linear relationship
marketing efforts
multiple R-squared
predict() function
profitability
properties
p-value
quality-related statistics
R command
residuals
residual standard error
sales personnel
standard error
testing
independence errors
linearity
normality
validation
crPlots(model name) function
gvlma() function
scale-location plot
value of significance
work environment
Lists, R
Logistic regression
binomial distribution
data creation
glm() function
lm() function
logistic regression model
model creation
comparison
conclusion
deviance
dispersion
glm() function
model fit verification
multicollinearity
residual deviance
summary of
variables
warning message
multinomial logistic regression
read.csv() command
regularization
training and testing
prediction() function
response variable
validation
Looping functions
apply() function
cut() function
lapply() function
sapply() function
split() function
tapply() function
Machine learning
Manhattan distance
MapReduce
Market-basket analysis (MBA)
Matrices, R
Measurable data
Microsoft Azure
Microsoft Business Intelligence and Tableau
Microsoft Excel file, reading data
Microsoft SQL Server database
Minkowski distance
Min-max normalization
Mtcars Data Set
Multicollinearity
Multinomial logistic regression
Multiple linear regression
assumptions
components
correlation
data
data-frame format
discrete variables
equation
lm() function
multicollinearity
predictors
response variable
R function glm()
stepwise
subsets approach
training and testing model
validation
crPlots
Durbin-Watson test
ncvTest(model name)
normal Q-Q plot
qqPlot
residuals vs. fitted
residuals vs. leverage plot
scale-location plot
Shapiro-Wilk normality test
multiple linear regression equation
Multiple regression
myFun() function
Naïve Bays
Natural language processing (NLP)
NbClust() function
Nominal data types
Nonhierarchical clustering
Non-linear regression
Normal distribution
Normalization techniques
NoSQL
Null hypothesis
Online analytical processing (OLAP)
Open Database Connectivity (ODBC)
Ordinal data types
Overdispersion
Packages and libraries, R
Partition clustering methods
Poisson distribution
Prediction
Predictive analytics
classification
regression
Predictive Model Markup Language (PMML)
Preprocessing data
preparation
duplicate, junk, and null characters
empty values
handling missing values
R
as.numeric() function
complete.cases() function
data types
factor levels
factor() type
head() command
methods
missing values
names() and c() function
table() function
vector operations
types
Probabilistic classification
advantages and limitations
bank credit-card approval process
Naïve Bays
Probability
concepts
distributions
events
mutually exclusive events
mutually independent events
mutually non-exclusive events
Probability distributions
binomial
normal
poisson
Probability sampling
Property graphs (PG)
Qualitative data
Quantitative data
R
advantages
console
control structures
for loops
if-else
looping functions
while loops
writing functions
data analysis
reading and writing data
data analysis tools
data structures
arrays
data frames
factors
lists
matrices
glm() function
installation
RStudio interface
interfaces
library(NbClust) command
lm() function
Naïve Bays
objects types
packages and libraries
pairs() command
programming, basics
assigning values
creating vector
View() command
Random forests
Random sampling
Ratio data types
read.csv() function
read.table() function
Receiver operating characteristic (ROC)
rect.hclust() function
Regularization
cv.fit() model
cv.glmnet() function
generic format
glmnet() function
glmnet_fit command
methods
plot() function
plot(cv.fit)
predict() function
print() function
shrinkage methods
variable
Ridge Regression method
RODBC package
Root-mean-square error (RSME)
RStudio
installation error
installing
interface
output
window
sapply() function
Scatter plot matrices
Scatter plots
analysis of data
changes, relationship
Coding
created in R
EmpData1
seq_along() function
Shrinkage methods
Simple regression
split() function
Standard deviations
Statistical parameters
mean
data set
downside of
in R
limitations
profit and effective
single parameter
usage of
median
mode
quantiles
range
standard deviation
summary(dataset)
variance
Storm
Stratified sampling
Supervised machine learning
Systematic sampling
tapply() function
Text file, reading data
Transformation
Trellis graphics
Univariate analysis
Unsupervised machine learning
association-rule analysis
association rules
if-then
interpreting results
market-basket analysis
rules
support
clustering
Variance errors
Variance inflation factor (VIF)
Variety
Velocity
Visualization
Workflow
Visualization
Web, reading data
while loops
Whole data processing
Z-score normalization
3.16.79.65