Sequential crawler

Here is the code to use AlexaCallback with the link crawler developed earlier to download sequentially:

scrape_callback = AlexaCallback()
link_crawler(seed_url=scrape_callback.seed_url, 
    cache_callback=MongoCache(),
    scrape_callback=scrape_callback)

This code is available at https://bitbucket.org/wswp/code/src/tip/chapter04/sequential_test.py and can be run from the command line as follows:

$ time python sequential_test.py
...
26m41.141s

This time is as expected for sequential downloading with an average of ~1.6 seconds per URL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.167.173