Home Page Icon
Home Page
Table of Contents for
Automated data collection with R
Close
Automated data collection with R
by Dominic Nyhuis, Peter Meissner, Christian Rubba, Simon Munzert
Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining
Preface
What you won't learn from reading this book
Why R?
Recommended reading to get started with R
Typographic conventions
The book's website
Disclaimer
Acknowledgments
Note
Chapter 1: Introduction
1.1 Case study: World Heritage Sites in Danger
1.2 Some remarks on web data quality
1.3 Technologies for disseminating, extracting, and storing web data
1.4 Structure of the book
Notes
Part One: A Primer on Web and Data Technologies
Chapter 2: HTML
2.1 Browser presentation and source code
2.2 Syntax rules
2.3 Tags and attributes
2.4 Parsing
Summary
Further reading
Problems
Notes
Chapter 3: XML and JSON
3.1 A short example XML document
3.2 XML syntax rules
3.3 When is an XML document well formed or valid?
3.4 XML extensions and technologies
3.5 XML and R in practice
3.6 A short example JSON document
3.7 JSON syntax rules
3.8 JSON and R in practice
Summary
Further reading
Problems
Notes
Chapter 4: XPath
4.1 XPath—a query language for web documents
4.2 Identifying node sets with XPath
4.3 Extracting node elements
Summary
Further reading
Problems
Notes
Chapter 5: HTTP
5.1 HTTP fundamentals
5.2 Advanced features of HTTP
5.3 Protocols beyond HTTP
5.4 HTTP in action
Summary
Further reading
Problems
Notes
Chapter 6: AJAX
6.1 JavaScript
6.2 XHR
6.3 Exploring AJAX with Web Developer Tools
Summary
Further reading
Problems
Chapter 7: SQL and relational databases
7.1 Overview and terminology
7.2 Relational Databases
7.3 SQL: a language to communicate with Databases
7.4 Databases in action
Summary
Further reading
Problems
Pokemon problems
ParlGov problems
Notes
Chapter 8: Regular expressions and essential string functions
8.1 Regular expressions
8.2 String processing
8.3 A word on character encodings
Summary
Further reading
Problems
Notes
Part Two: A Practical Toolbox for Web Scraping and Text Mining
Chapter 9: Scraping the Web
9.1 Retrieval scenarios
9.2 Extraction strategies
9.3 Web scraping: Good practice
9.4 Valuable sources of inspiration
Summary
Further reading
Problems
Notes
Chapter 10: Statistical text processing
10.1 The running example: Classifying press releases of the British government
10.2 Processing textual data
10.3 Supervised learning techniques
10.4 Unsupervised learning techniques
Summary
Further reading
Notes
Chapter 11: Managing data projects
11.1 Interacting with the file system
11.2 Processing multiple documents/links
11.3 Organizing scraping procedures
11.4 Executing R scripts on a regular basis
Notes
Part Three: A Bag of Case Studies
Chapter 12: Collaboration networks in the US Senate
12.1 Information on the bills
12.2 Information on the senators
12.3 Analyzing the network structure
12.4 Conclusion
Notes
Chapter 13: Parsing information from semistructured documents
13.1 Downloading data from the FTP server
13.2 Parsing semistructured text data
13.3 Visualizing station and temperature data
Notes
Chapter 14: Predicting the 2014 Academy Awards using Twitter
14.1 Twitter APIs: Overview
14.2 Twitter-based forecast of the 2014 Academy Awards
14.3 Conclusion
Notes
Chapter 15: Mapping the geographic distribution of names
15.1 Developing a data collection strategy
15.2 Website inspection
15.3 Data retrieval and information extraction
15.4 Mapping names
15.5 Automating the process
Summary
Notes
Chapter 16: Gathering data on mobile phones
16.1 Page exploration
16.2 Scraping procedure
16.3 Graphical analysis
16.4 Data storage
Note
Chapter 17: Analyzing sentiments of product reviews
17.1 Introduction
17.2 Collecting the data
17.3 Analyzing the data
17.4 Conclusion
Notes
References
General index
Package index
Function index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Automated data collection with R
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset