Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8. Scrapy

Scrapy is a popular web scraping framework that comes with many high-level functions to make scraping websites easier. In this chapter, we will get to know Scrapy by using it to scrape the example website, just as we did in Chapter 2, Scraping the Data. Then, we will cover Portia, which is an application based on Scrapy that allows you to scrape a website through a point and click interface

Installation

Scrapy can be installed with the pip command, as follows:

pip install Scrapy

Scrapy relies on some external libraries so if you have trouble installing it there is additional information available on the official website at: http://doc.scrapy.org/en/latest/intro/install.html.

Currently, Scrapy only supports Python 2.7, which is more restrictive than other packages introduced in this book. Previously, Python 2.6 was also supported, but this was dropped in Scrapy 0.20. Also due to the dependency on Twisted, support for Python 3 is not yet possible, though the Scrapy team assures me they are working to solve this.

If Scrapy is installed correctly, a scrapy command will now be available in the terminal:

$ scrapy -h
Scrapy 0.24.4 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  check         Check spider contracts
  crawl         Run a spider
...

We will use the following commands in this chapter:

startproject: Creates a new project
genspider: Generates a new spider from a template
crawl: Runs a spider
shell: Starts the interactive scraping console

Note

For detailed information about these and the other commands available, refer to http://doc.scrapy.org/en/latest/topics/commands.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8. Scrapy

Create new playlist

Sign In

Sign Up

Chapter 8. Scrapy

Installation

Note

Table of Contents for
8. Scrapy