Example three – downloading random pictures

This example has been fun to code. We are going to download random pictures from a website. I'll show you three versions: a serial one, a multiprocessing one, and finally a solution coded using asyncio. In these examples, we are going to use a website called http://lorempixel.com, which provides you with an API that you can call to get random images. If you find that the website is down or slow, you can use an excellent alternative to it: https://lorempizza.com/.

It may be something of a cliché for a book written by an Italian, but the pictures are gorgeous. You can search for another alternative on the web, if you want to have some fun. Whatever website you choose, please be sensible and try not to hammer it by making a million requests to it. The multiprocessing and asyncio versions of this code can be quite aggressive!

Let's start by exploring the single-threaded version of the code:

# aio/randompix_serial.py
import os
from secrets import token_hex
import requests

PICS_FOLDER = 'pics'
URL = 'http://lorempixel.com/640/480/'

def download(url):
resp = requests.get(URL)
return save_image(resp.content)

def save_image(content):
filename = '{}.jpg'.format(token_hex(4))
path = os.path.join(PICS_FOLDER, filename)
with open(path, 'wb') as stream:
stream.write(content)
return filename

def batch_download(url, n):
return [download(url) for _ in range(n)]

if __name__ == '__main__':
saved = batch_download(URL, 10)
print(saved)

This code should be straightforward to you by now. We define a download function, which makes a request to the given URL, saves the result by calling save_image, and feeds it the body of the response from the website. Saving the image is very simple: we create a random filename with token_hex, just because it's fun, then we calculate the full path of the file, create it in binary mode, and write into it the content of the response. We return the filename to be able to print it on screen. Finally batch_download simply runs the n requests we want to run and returns the filenames as a result.

You can leapfrog the if __name__ ... line for now, and it's not important here. All we do is call batch_download with the URL and we tell it to download 10 images. If you have an editor, open the pics folder, and you can see it getting populated in a few seconds (also notice: the script assumes the pics folder exists).

Let's spice things up a bit. Let's introduce multiprocessing (the code is vastly similar, so I will not repeat it):

# aio/randompix_proc.py
...
from concurrent.futures import ProcessPoolExecutor, as_completed
...

def batch_download(url, n, workers=4):
with ProcessPoolExecutor(max_workers=workers) as executor:
futures = (executor.submit(download, url) for _ in range(n))
return [future.result() for future in as_completed(futures)]

...

The technique should be familiar to you by now. We simply submit jobs to the executor, and collect the results as they become available. Because this is IO bound code, the processes work quite fast and there is heavy context-switching while the processes are waiting for the API response. If you have a view over the pics folder, you will notice that it's not getting populated in a linear fashion any more, but rather, in batches.

Let's now look at the asyncio version of this example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.95.38