Synchronous web-scraping

The synchronous scraper only uses Python standard libraries such as urllib. It downloads the home page of three popular sites and a fourth site whose loading time can be delayed to simulate a slow connection. It prints the respective page sizes and the total running time.

Here's the code for the synchronous scraper located at src/extras/sync.py:

"""Synchronously download a list of webpages and time it""" 
from urllib.request import Request, urlopen 
from time import time 
 
sites = [ 
    "http://news.ycombinator.com/", 
    "https://www.yahoo.com/", 
    "http://www.aliexpress.com/", 
    "http://deelay.me/5000/http://deelay.me/", 
] 
 
 
def find_size(url): 
    req = Request(url) 
    with urlopen(req) as response: 
        page = response.read() 
        return len(page) 
 
 
def main(): 
    for site in sites: 
        size = find_size(site) 
        print("Read {:8d} chars from {}".format(size, site)) 
 
 
if __name__ == '__main__': 
    start_time = time() 
    main() 
    print("Ran in {:6.3f} secs".format(time() - start_time)) 

On a test laptop, this code took 17.1 seconds to run. It is the cumulative loading time of each site. Let's see how asynchronous code runs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.210.143