Writing files asynchronously

Most of the time, we would like to collect data by making requests to multiple websites, and simply printing out the response HTML code is inappropriate (for many reasons); instead, we'd like to write the returned HTML code to output files. In essence, this process is asynchronous downloading, which is also implemented in the underlying architecture of popular download managers. To do this, we will use the aiofiles module, in combination with aiohttp and asyncio.

Navigate to the Chapter11/example5.py file. First, we will look at the download_html() coroutine, as follows:

# Chapter11/example5.py

async def download_html(session, url):
async with session.get(url, ssl=False) as res:
filename = f'output/{os.path.basename(url)}.html'

async with aiofiles.open(filename, 'wb') as f:
while True:
chunk = await res.content.read(1024)
if not chunk:
break
await f.write(chunk)

return await res.release()

This is an updated version of the get_html() coroutine from the last example. Instead of using an aiohttp.ClientSession instance to make a GET request and print out the returned HTML code, now we write the HTML code to the file using the aiofiles module. For example, to facilitate asynchronous file writing, we use the asynchronous open() function from aiofiles to read in a file in a context manager. Furthermore, we read the returned HTML in chunks, asynchronously, using the read() function for the content attribute of the response object; this means that after reading 1024 bytes of the current response, the execution flow will be released back to the event loop, and the task-switching event will take place.

The main() coroutine and the main program of this example remain relatively the same as those in our last example:

async def main(url):
async with aiohttp.ClientSession() as session:
await download_html(session, url)

urls = [
'http://packtpub.com',
'http://python.org',
'http://docs.python.org/3/library/asyncio',
'http://aiohttp.readthedocs.io',
'http://google.com'
]

loop = asyncio.get_event_loop()
loop.run_until_complete(
asyncio.gather(*(main(url) for url in urls))
)

The main() coroutine takes in a URL and passes it to the download_html() coroutine, along with an aiohttp.ClientSession instance. Finally, in our main program, we create an event loop and pass each item in a specified list of URLs to the main() coroutine. After running the program, your output should look similar to the following, although the time it takes to run the program might vary:

> python3 example5.py
Took 0.72 seconds.

Additionally, a subfolder named output (inside of the Chapter11 folder) will be filled with the downloaded HTML code from each website in our list of URLs. Again, these files were created and written asynchronously, via the functionalities of the aiofiles module, which we discussed earlier. As you can see, to compare the speed of this program and its corresponding synchronous version, we are also keeping track of the time it takes to run the entire program.

Now, head to the Chapter11/example6.py file. This script contains the code of the synchronous version of our current program. Specifically, it makes HTTP GET requests to individual websites in order, and the process of file writing is also implemented sequentially. This script produced the following output:

> python3 example6.py
Took 1.47 seconds.

While it achieved the same results (downloading the HTML code and writing it to files), our sequential program took significantly more time than its asynchronous counterpart.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.239.103