Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Bater Makhabel
Learning Data Mining with R
Learning Data Mining with R
Table of Contents
Learning Data Mining with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Warming Up
Big data
Scalability and efficiency
Data source
Data mining
Feature extraction
Summarization
The data mining process
CRISP-DM
SEMMA
Social network mining
Social network
Text mining
Information retrieval and text mining
Mining text for prediction
Web data mining
Why R?
What are the disadvantages of R?
Statistics
Statistics and data mining
Statistics and machine learning
Statistics and R
The limitations of statistics on data mining
Machine learning
Approaches to machine learning
Machine learning architecture
Data attributes and description
Numeric attributes
Categorical attributes
Data description
Data measuring
Data cleaning
Missing values
Junk, noisy data, or outlier
Data integration
Data dimension reduction
Eigenvalues and Eigenvectors
Principal-Component Analysis
Singular-value decomposition
CUR decomposition
Data transformation and discretization
Data transformation
Normalization data transformation methods
Data discretization
Visualization of results
Visualization with R
Time for action
Summary
2. Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Patterns and pattern discovery
The frequent itemset
The frequent subsequence
The frequent substructures
Relationship or rules discovery
Association rules
Correlation rules
Market basket analysis
The market basket model
A-Priori algorithms
Input data characteristics and data structure
The A-Priori algorithm
The R implementation
A-Priori algorithm variants
The Eclat algorithm
The R implementation
The FP-growth algorithm
Input data characteristics and data structure
The FP-growth algorithm
The R implementation
The GenMax algorithm with maximal frequent itemsets
The R implementation
The Charm algorithm with closed frequent itemsets
The R implementation
The algorithm to generate association rules
The R implementation
Hybrid association rules mining
Mining multilevel and multidimensional association rules
Constraint-based frequent pattern mining
Mining sequence dataset
Sequence dataset
The GSP algorithm
The R implementation
The SPADE algorithm
The R implementation
Rule generation from sequential patterns
High-performance algorithms
Time for action
Summary
3. Classification
Classification
Generic decision tree induction
Attribute selection measures
Tree pruning
General algorithm for the decision tree generation
The R implementation
High-value credit card customers classification using ID3
The ID3 algorithm
The R implementation
Web attack detection
High-value credit card customers classification
Web spam detection using C4.5
The C4.5 algorithm
The R implementation
A parallel version with MapReduce
Web spam detection
Web key resource page judgment using CART
The CART algorithm
The R implementation
Web key resource page judgment
Trojan traffic identification method and Bayes classification
Estimating
Prior probability estimation
Likelihood estimation
The Bayes classification
The R implementation
Trojan traffic identification method
Identify spam e-mail and Naïve Bayes classification
The Naïve Bayes classification
The R implementation
Identify spam e-mail
Rule-based classification of player types in computer games and rule-based classification
Transformation from decision tree to decision rules
Rule-based classification
Sequential covering algorithm
The RIPPER algorithm
The R implementation
Rule-based classification of player types in computer games
Time for action
Summary
4. Advanced Classification
Ensemble (EM) methods
The bagging algorithm
The boosting and AdaBoost algorithms
The Random forests algorithm
The R implementation
Parallel version with MapReduce
Biological traits and the Bayesian belief network
The Bayesian belief network (BBN) algorithm
The R implementation
Biological traits
Protein classification and the k-Nearest Neighbors algorithm
The kNN algorithm
The R implementation
Document retrieval and Support Vector Machine
The SVM algorithm
The R implementation
Parallel version with MapReduce
Document retrieval
Classification using frequent patterns
The associative classification
CBA
Discriminative frequent pattern-based classification
The R implementation
Text classification using sentential frequent itemsets
Classification using the backpropagation algorithm
The BP algorithm
The R implementation
Parallel version with MapReduce
Time for action
Summary
5. Cluster Analysis
Search engines and the k-means algorithm
The k-means clustering algorithm
The kernel k-means algorithm
The k-modes algorithm
The R implementation
Parallel version with MapReduce
Search engine and web page clustering
Automatic abstraction of document texts and the k-medoids algorithm
The PAM algorithm
The R implementation
Automatic abstraction and summarization of document text
The CLARA algorithm
The CLARA algorithm
The R implementation
CLARANS
The CLARANS algorithm
The R implementation
Unsupervised image categorization and affinity propagation clustering
Affinity propagation clustering
The R implementation
Unsupervised image categorization
The spectral clustering algorithm
The R implementation
News categorization and hierarchical clustering
Agglomerative hierarchical clustering
The BIRCH algorithm
The chameleon algorithm
The Bayesian hierarchical clustering algorithm
The probabilistic hierarchical clustering algorithm
The R implementation
News categorization
Time for action
Summary
6. Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
The DBSCAN algorithm
Customer categorization analysis of e-commerce
Clustering web pages and OPTICS
The OPTICS algorithm
The R implementation
Clustering web pages
Visitor analysis in the browser cache and DENCLUE
The DENCLUE algorithm
The R implementation
Visitor analysis in the browser cache
Recommendation system and STING
The STING algorithm
The R implementation
Recommendation systems
Web sentiment analysis and CLIQUE
The CLIQUE algorithm
The R implementation
Web sentiment analysis
Opinion mining and WAVE clustering
The WAVE cluster algorithm
The R implementation
Opinion mining
User search intent and the EM algorithm
The EM algorithm
The R implementation
The user search intent
Customer purchase data analysis and clustering high-dimensional data
The MAFIA algorithm
The SURFING algorithm
The R implementation
Customer purchase data analysis
SNS and clustering graph and network data
The SCAN algorithm
The R implementation
Social networking service (SNS)
Time for action
Summary
7. Outlier Detection
Credit card fraud detection and statistical methods
The likelihood-based outlier detection algorithm
The R implementation
Credit card fraud detection
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
The NL algorithm
The FindAllOutsM algorithm
The FindAllOutsD algorithm
The distance-based algorithm
The Dolphin algorithm
The R implementation
Activity monitoring and the detection of mobile fraud
Intrusion detection and density-based methods
The OPTICS-OF algorithm
The High Contrast Subspace algorithm
The R implementation
Intrusion detection
Intrusion detection and clustering-based methods
Hierarchical clustering to detect outliers
The k-means-based algorithm
The ODIN algorithm
The R implementation
Monitoring the performance of the web server and classification-based methods
The OCSVM algorithm
The one-class nearest neighbor algorithm
The R implementation
Monitoring the performance of the web server
Detecting novelty in text, topic detection, and mining contextual outliers
The conditional anomaly detection (CAD) algorithm
The R implementation
Detecting novelty in text and topic detection
Collective outliers on spatial data
The route outlier detection (ROD) algorithm
The R implementation
Characteristics of collective outliers
Outlier detection in high-dimensional data
The brute-force algorithm
The HilOut algorithm
The R implementation
Time for action
Summary
8. Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
The STREAM algorithm
The single-pass-any-time clustering algorithm
The R implementation
The credit card transaction flow
Predicting future prices and time-series analysis
The ARIMA algorithm
Predicting future prices
Stock market data and time-series clustering and classification
The hError algorithm
Time-series classification with the 1NN classifier
The R implementation
Stock market data
Web click streams and mining symbolic sequences
The TECNO-STREAMS algorithm
The R implementation
Web click streams
Mining sequence patterns in transactional databases
The PrefixSpan algorithm
The R implementation
Time for action
Summary
9. Graph Mining and Network Analysis
Graph mining
Graph
Graph mining algorithms
Mining frequent subgraph patterns
The gPLS algorithm
The GraphSig algorithm
The gSpan algorithm
Rightmost path extensions and their supports
The subgraph isomorphism enumeration algorithm
The canonical checking algorithm
The R implementation
Social network mining
Community detection and the shingling algorithm
The node classification and iterative classification algorithms
The R implementation
Time for action
Summary
10. Mining Text and Web Data
Text mining and TM packages
Text summarization
Topic representation
The multidocument summarization algorithm
The Maximal Marginal Relevance algorithm
The R implementation
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
The N-gram-based text categorization
The R implementation
Web usage mining with web logs
The FCA-based association rule mining algorithm
The R implementation
Time for action
Summary
A. Algorithms and Data Structures
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Learning Data Mining with R
Table of Contents
Learning Data Mining with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Warming Up
Big data
Scalability and efficiency
Data source
Data mining
Feature extraction
Summarization
The data mining process
CRISP-DM
SEMMA
Social network mining
Social network
Text mining
Information retrieval and text mining
Mining text for prediction
Web data mining
Why R?
What are the disadvantages of R?
Statistics
Statistics and data mining
Statistics and machine learning
Statistics and R
The limitations of statistics on data mining
Machine learning
Approaches to machine learning
Machine learning architecture
Data attributes and description
Numeric attributes
Categorical attributes
Data description
Data measuring
Data cleaning
Missing values
Junk, noisy data, or outlier
Data integration
Data dimension reduction
Eigenvalues and Eigenvectors
Principal-Component Analysis
Singular-value decomposition
CUR decomposition
Data transformation and discretization
Data transformation
Normalization data transformation methods
Data discretization
Visualization of results
Visualization with R
Time for action
Summary
2. Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Patterns and pattern discovery
The frequent itemset
The frequent subsequence
The frequent substructures
Relationship or rules discovery
Association rules
Correlation rules
Market basket analysis
The market basket model
A-Priori algorithms
Input data characteristics and data structure
The A-Priori algorithm
The R implementation
A-Priori algorithm variants
The Eclat algorithm
The R implementation
The FP-growth algorithm
Input data characteristics and data structure
The FP-growth algorithm
The R implementation
The GenMax algorithm with maximal frequent itemsets
The R implementation
The Charm algorithm with closed frequent itemsets
The R implementation
The algorithm to generate association rules
The R implementation
Hybrid association rules mining
Mining multilevel and multidimensional association rules
Constraint-based frequent pattern mining
Mining sequence dataset
Sequence dataset
The GSP algorithm
The R implementation
The SPADE algorithm
The R implementation
Rule generation from sequential patterns
High-performance algorithms
Time for action
Summary
3. Classification
Classification
Generic decision tree induction
Attribute selection measures
Tree pruning
General algorithm for the decision tree generation
The R implementation
High-value credit card customers classification using ID3
The ID3 algorithm
The R implementation
Web attack detection
High-value credit card customers classification
Web spam detection using C4.5
The C4.5 algorithm
The R implementation
A parallel version with MapReduce
Web spam detection
Web key resource page judgment using CART
The CART algorithm
The R implementation
Web key resource page judgment
Trojan traffic identification method and Bayes classification
Estimating
Prior probability estimation
Likelihood estimation
The Bayes classification
The R implementation
Trojan traffic identification method
Identify spam e-mail and Naïve Bayes classification
The Naïve Bayes classification
The R implementation
Identify spam e-mail
Rule-based classification of player types in computer games and rule-based classification
Transformation from decision tree to decision rules
Rule-based classification
Sequential covering algorithm
The RIPPER algorithm
The R implementation
Rule-based classification of player types in computer games
Time for action
Summary
4. Advanced Classification
Ensemble (EM) methods
The bagging algorithm
The boosting and AdaBoost algorithms
The Random forests algorithm
The R implementation
Parallel version with MapReduce
Biological traits and the Bayesian belief network
The Bayesian belief network (BBN) algorithm
The R implementation
Biological traits
Protein classification and the k-Nearest Neighbors algorithm
The kNN algorithm
The R implementation
Document retrieval and Support Vector Machine
The SVM algorithm
The R implementation
Parallel version with MapReduce
Document retrieval
Classification using frequent patterns
The associative classification
CBA
Discriminative frequent pattern-based classification
The R implementation
Text classification using sentential frequent itemsets
Classification using the backpropagation algorithm
The BP algorithm
The R implementation
Parallel version with MapReduce
Time for action
Summary
5. Cluster Analysis
Search engines and the k-means algorithm
The k-means clustering algorithm
The kernel k-means algorithm
The k-modes algorithm
The R implementation
Parallel version with MapReduce
Search engine and web page clustering
Automatic abstraction of document texts and the k-medoids algorithm
The PAM algorithm
The R implementation
Automatic abstraction and summarization of document text
The CLARA algorithm
The CLARA algorithm
The R implementation
CLARANS
The CLARANS algorithm
The R implementation
Unsupervised image categorization and affinity propagation clustering
Affinity propagation clustering
The R implementation
Unsupervised image categorization
The spectral clustering algorithm
The R implementation
News categorization and hierarchical clustering
Agglomerative hierarchical clustering
The BIRCH algorithm
The chameleon algorithm
The Bayesian hierarchical clustering algorithm
The probabilistic hierarchical clustering algorithm
The R implementation
News categorization
Time for action
Summary
6. Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
The DBSCAN algorithm
Customer categorization analysis of e-commerce
Clustering web pages and OPTICS
The OPTICS algorithm
The R implementation
Clustering web pages
Visitor analysis in the browser cache and DENCLUE
The DENCLUE algorithm
The R implementation
Visitor analysis in the browser cache
Recommendation system and STING
The STING algorithm
The R implementation
Recommendation systems
Web sentiment analysis and CLIQUE
The CLIQUE algorithm
The R implementation
Web sentiment analysis
Opinion mining and WAVE clustering
The WAVE cluster algorithm
The R implementation
Opinion mining
User search intent and the EM algorithm
The EM algorithm
The R implementation
The user search intent
Customer purchase data analysis and clustering high-dimensional data
The MAFIA algorithm
The SURFING algorithm
The R implementation
Customer purchase data analysis
SNS and clustering graph and network data
The SCAN algorithm
The R implementation
Social networking service (SNS)
Time for action
Summary
7. Outlier Detection
Credit card fraud detection and statistical methods
The likelihood-based outlier detection algorithm
The R implementation
Credit card fraud detection
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
The NL algorithm
The FindAllOutsM algorithm
The FindAllOutsD algorithm
The distance-based algorithm
The Dolphin algorithm
The R implementation
Activity monitoring and the detection of mobile fraud
Intrusion detection and density-based methods
The OPTICS-OF algorithm
The High Contrast Subspace algorithm
The R implementation
Intrusion detection
Intrusion detection and clustering-based methods
Hierarchical clustering to detect outliers
The k-means-based algorithm
The ODIN algorithm
The R implementation
Monitoring the performance of the web server and classification-based methods
The OCSVM algorithm
The one-class nearest neighbor algorithm
The R implementation
Monitoring the performance of the web server
Detecting novelty in text, topic detection, and mining contextual outliers
The conditional anomaly detection (CAD) algorithm
The R implementation
Detecting novelty in text and topic detection
Collective outliers on spatial data
The route outlier detection (ROD) algorithm
The R implementation
Characteristics of collective outliers
Outlier detection in high-dimensional data
The brute-force algorithm
The HilOut algorithm
The R implementation
Time for action
Summary
8. Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
The STREAM algorithm
The single-pass-any-time clustering algorithm
The R implementation
The credit card transaction flow
Predicting future prices and time-series analysis
The ARIMA algorithm
Predicting future prices
Stock market data and time-series clustering and classification
The hError algorithm
Time-series classification with the 1NN classifier
The R implementation
Stock market data
Web click streams and mining symbolic sequences
The TECNO-STREAMS algorithm
The R implementation
Web click streams
Mining sequence patterns in transactional databases
The PrefixSpan algorithm
The R implementation
Time for action
Summary
9. Graph Mining and Network Analysis
Graph mining
Graph
Graph mining algorithms
Mining frequent subgraph patterns
The gPLS algorithm
The GraphSig algorithm
The gSpan algorithm
Rightmost path extensions and their supports
The subgraph isomorphism enumeration algorithm
The canonical checking algorithm
The R implementation
Social network mining
Community detection and the shingling algorithm
The node classification and iterative classification algorithms
The R implementation
Time for action
Summary
10. Mining Text and Web Data
Text mining and TM packages
Text summarization
Topic representation
The multidocument summarization algorithm
The Maximal Marginal Relevance algorithm
The R implementation
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
The N-gram-based text categorization
The R implementation
Web usage mining with web logs
The FCA-based association rule mining algorithm
The R implementation
Time for action
Summary
A. Algorithms and Data Structures
Index
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset