Home Page Icon
Home Page
Table of Contents for
To get the most out of this book
Close
To get the most out of this book
by Anish Chapagain
Hands-On Web Scraping with Python
Title Page
Copyright and Credits
Hands-On Web Scraping with Python
Dedication
About Packt
Why subscribe?
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Introduction to Web Scraping
Web Scraping Fundamentals
Introduction to web scraping
Understanding web development and technologies
HTTP
HTML 
HTML elements and attributes
Global attributes
XML
JavaScript
JSON
CSS
AngularJS
Data finding techniques for the web
HTML page source
Case 1
Case 2
Developer tools
Sitemaps
The robots.txt file
Summary
Further reading
Section 2: Beginning Web Scraping
Python and the Web – Using urllib and Requests
Technical requirements
Accessing the web with Python
Setting things up
Loading URLs
URL handling and operations with urllib and requests
urllib
requests
Implementing HTTP methods
GET
POST
Summary
Further reading
Using LXML, XPath, and CSS Selectors
Technical requirements
Introduction to XPath and CSS selector
XPath
CSS selectors
Element selectors
ID and class selectors
Attribute selectors
Pseudo selectors
Using web browser developer tools for accessing web content
HTML elements and DOM navigation
XPath and CSS selectors using DevTools
Scraping using lxml, a Python library
lxml by examples
Example 1 – reading XML from file and traversing through its elements
Example 2 – reading HTML documents using lxml.html
Example 3 – reading and parsing HTML for retrieving HTML form type element attributes
Web scraping using lxml
Example 1 – extracting selected data from a single page using lxml.html.xpath
Example 2 – looping with XPath and scraping data from multiple pages
Example 3 – using lxml.cssselect to scrape content from a page
Summary
Further reading
Scraping Using pyquery – a Python Library
Technical requirements
Introduction to pyquery
Exploring pyquery
Loading documents
Element traversing, attributes, and pseudo-classes
Iterating
Web scraping using pyquery
Example 1 – scraping data science announcements
Example 2 – scraping information from nested links
Example 3 – extracting AHL Playoff results
Example 4 – collecting URLs from sitemap.xml
Case 1 – using the HTML parser
Case 2 – using the XML parser
Summary
Further reading
Web Scraping Using Scrapy and Beautiful Soup
Technical requirements
Web scraping using Beautiful Soup
Introduction to Beautiful Soup
Exploring Beautiful Soup
Searching, traversing, and iterating
Using children and parents
Using next and previous
Using CSS Selectors
Example 1 – listing <li> elements with the data-id attribute 
Example 2 – traversing through elements
Example 3 – searching elements based on attribute values
Building a web crawler
Web scraping using Scrapy
Introduction to Scrapy
Setting up a project
Generating a Spider
Creating an item
Extracting data
Using XPath
Using CSS Selectors
Data from multiple pages
Running and exporting
Deploying a web crawler
Summary
Further reading
Section 3: Advanced Concepts
Working with Secure Web
Technical requirements
Introduction to secure web
Form processing
Cookies and sessions
Cookies
Sessions
User authentication
HTML <form> processing
Handling user authentication
Working with cookies and sessions
Summary
Further reading
Data Extraction Using Web-Based APIs
Technical requirements
Introduction to web APIs
REST and SOAP
REST 
SOAP 
Benefits of web APIs
Accessing web API and data formats
Making requests to the web API using a web browser
Case 1 – accessing a simple API (request and response)
Case 2 – demonstrating status codes and informative responses from the API
Case 3 – demonstrating RESTful API cache functionality
Web scraping using APIs
Example 1 – searching and collecting university names and URLs
Example 2 – scraping information from GitHub events
Summary
Further reading
Using Selenium to Scrape the Web
Technical requirements
Introduction to Selenium
Selenium projects
Selenium WebDriver
Selenium RC
Selenium Grid
Selenium IDE
Setting things up
Exploring Selenium
Accessing browser properties
Locating web elements
Using Selenium for web scraping
Example 1 – scraping product information
Example 2 – scraping book information
Summary
Further reading
Using Regex to Extract Data
Technical requirements
Overview of regular expressions
Regular expressions and Python
Using regular expressions to extract data
Example 1 – extracting HTML-based content
Example 2 – extracting dealer locations
Example 3 – extracting XML content
Summary
Further reading
Section 4: Conclusion
Next Steps
Technical requirements
Managing scraped data
Writing to files
Analysis and visualization using pandas and matplotlib
Machine learning 
ML and AI
Python and ML
Types of ML algorithms
Supervised learning
Classification
Regression
Unsupervised learning
Association
Clustering
Reinforcement learning
Data mining 
Tasks of data mining
Predictive
Classification
Regression
Prediction 
Descriptive
Clustering
Summarization
Association rules
What's next?
Summary 
Further reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
What this book covers
Next
Next Chapter
Download the example code files
To get the most out of this book
Readers should have some working knowledge of the Python programming language.
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset