Home Page Icon
Home Page
Table of Contents for
Hands-On Data Science and Python Machine Learning
Close
Hands-On Data Science and Python Machine Learning
by Frank Kane
Hands-On Data Science and Python Machine Learning
Preface
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Getting Started
Installing Enthought Canopy
Giving the installation a test run
If you occasionally get problems opening your IPNYB files
Using and understanding IPython (Jupyter) Notebooks
Python basics - Part 1
Understanding Python code
Importing modules
Data structures
Experimenting with lists
Pre colon
Post colon
Negative syntax
Adding list to list
The append function
Complex data structures
Dereferencing a single element
The sort function
Reverse sort
Tuples
Dereferencing an element
List of tuples
Dictionaries
Iterating through entries
Python basics - Part 2
Functions in Python
Lambda functions - functional programming
Understanding boolean expressions
The if statement
The if-else loop
Looping
The while loop
Exploring activity
Running Python scripts
More options than just the IPython/Jupyter Notebook
Running Python scripts in command prompt
Using the Canopy IDE
Summary
Statistics and Probability Refresher, and Python Practice
Types of data
Numerical data
Discrete data
Continuous data
Categorical data
Ordinal data
Mean, median, and mode
Mean
Median
The factor of outliers
Mode
Using mean, median, and mode in Python
Calculating mean using the NumPy package
Visualizing data using matplotlib
Calculating median using the NumPy package
Analyzing the effect of outliers
Calculating mode using the SciPy package
Some exercises
Standard deviation and variance
Variance
Measuring variance
Standard deviation
Identifying outliers with standard deviation
Population variance versus sample variance
The Mathematical explanation
Analyzing standard deviation and variance on a histogram
Using Python to compute standard deviation and variance
Try it yourself
Probability density function and probability mass function
The probability density function and probability mass functions
Probability density functions
Probability mass functions
Types of data distributions
Uniform distribution
Normal or Gaussian distribution
The exponential probability distribution or Power law
Binomial probability mass function
Poisson probability mass function
Percentiles and moments
Percentiles
Quartiles
Computing percentiles in Python
Moments
Computing moments in Python
Summary
Matplotlib and Advanced Probability Concepts
A crash course in Matplotlib
Generating multiple plots on one graph
Saving graphs as images
Adjusting the axes
Adding a grid
Changing line types and colors
Labeling axes and adding a legend
A fun example
Generating pie charts
Generating bar charts
Generating scatter plots
Generating histograms
Generating box-and-whisker plots
Try it yourself
Covariance and correlation
Defining the concepts
Measuring covariance
Correlation
Computing covariance and correlation in Python
Computing correlation – The hard way
Computing correlation – The NumPy way
Correlation activity
Conditional probability
Conditional probability exercises in Python
Conditional probability assignment
My assignment solution
Bayes' theorem
Summary
Predictive Models
Linear regression
The ordinary least squares technique
The gradient descent technique
The co-efficient of determination or r-squared
Computing r-squared
Interpreting r-squared
Computing linear regression and r-squared using Python
Activity for linear regression
Polynomial regression
Implementing polynomial regression using NumPy
Computing the r-squared error
Activity for polynomial regression
Multivariate regression and predicting car prices
Multivariate regression using Python
Activity for multivariate regression
Multi-level models
Summary
Machine Learning with Python
Machine learning and train/test
Unsupervised learning
Supervised learning
Evaluating supervised learning
K-fold cross validation
Using train/test to prevent overfitting of a polynomial regression
Activity
Bayesian methods - Concepts
Implementing a spam classifier with Naïve Bayes
Activity
K-Means clustering
Limitations to k-means clustering
Clustering people based on income and age
Activity
Measuring entropy
Decision trees - Concepts
Decision tree example
Walking through a decision tree
Random forests technique
Decision trees - Predicting hiring decisions using Python
Ensemble learning – Using a random forest
Activity
Ensemble learning
Support vector machine overview
Using SVM to cluster people by using scikit-learn
Activity
Summary
Recommender Systems
What are recommender systems?
User-based collaborative filtering
Limitations of user-based collaborative filtering
Item-based collaborative filtering
Understanding item-based collaborative filtering
How item-based collaborative filtering works?
Collaborative filtering using Python
Finding movie similarities
Understanding the code
The corrwith function
Improving the results of movie similarities
Making movie recommendations to people
Understanding movie recommendations with an example
Using the groupby command to combine rows
Removing entries with the drop command
Improving the recommendation results
Summary
More Data Mining and Machine Learning Techniques
K-nearest neighbors - concepts
Using KNN to predict a rating for a movie
Activity
Dimensionality reduction and principal component analysis
Dimensionality reduction
Principal component analysis
A PCA example with the Iris dataset
Activity
Data warehousing overview
ETL versus ELT
Reinforcement learning
Q-learning
The exploration problem
The simple approach
The better way
Fancy words
Markov decision process
Dynamic programming
Summary
Dealing with Real-World Data
Bias/variance trade-off
K-fold cross-validation to avoid overfitting
Example of k-fold cross-validation using scikit-learn
Data cleaning and normalisation
Cleaning web log data
Applying a regular expression on the web log
Modification one - filtering the request field
Modification two - filtering post requests
Modification three - checking the user agents
Filtering the activity of spiders/robots
Modification four - applying website-specific filters
Activity for web log data
Normalizing numerical data
Detecting outliers
Dealing with outliers
Activity for outliers
Summary
Apache Spark - Machine Learning on Big Data
Installing Spark
Installing Spark on Windows
Installing Spark on other operating systems
Installing the Java Development Kit
Installing Spark
Spark introduction
It's scalable
It's fast
It's young
It's not difficult
Components of Spark
Python versus Scala for Spark
Spark and Resilient Distributed Datasets (RDD)
The SparkContext object
Creating RDDs
Creating an RDD using a Python list
Loading an RDD from a text file
More ways to create RDDs
RDD operations
Transformations
Using map()
Actions
Introducing MLlib
Some MLlib Capabilities
Special MLlib data types
The vector data type
LabeledPoint data type
Rating data type
Decision Trees in Spark with MLlib
Exploring decision trees code
Creating the SparkContext
Importing and cleaning our data
Creating a test candidate and building our decision tree
Running the script
K-Means Clustering in Spark
Within set sum of squared errors (WSSSE)
Running the code
TF-IDF
TF-IDF in practice
Using TF- IDF
Searching wikipedia with Spark MLlib
Import statements
Creating the initial RDD
Creating and transforming a HashingTF object
Computing the TF-IDF score
Using the Wikipedia search engine algorithm
Running the algorithm
Using the Spark 2.0 DataFrame API for MLlib
How Spark 2.0 MLlib works
Implementing linear regression
Summary
Testing and Experimental Design
A/B testing concepts
A/B tests
Measuring conversion for A/B testing
How to attribute conversions
Variance is your enemy
T-test and p-value
The t-statistic or t-test
The p-value
Measuring t-statistics and p-values using Python
Running A/B test on some experimental data
When there's no real difference between the two groups
Does the sample size make a difference?
Sample size increased to six-digits
Sample size increased seven-digits
A/A testing
Determining how long to run an experiment for
A/B test gotchas
Novelty effects
Seasonal effects
Selection bias
Auditing selection bias issues
Data pollution
Attribution errors
Summary
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Title Page
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset