Fetching a website's HTML code

First, let's look at how to make a request and obtain the HTML source code from a single website with aiohttp. Note that even with only one task (a website), our application remains asynchronous, and the structure of an asynchronous program still needs to be implemented. Now, navigate to the Chapter11/example4.py file, as follows:

# Chapter11/example4.py

import aiohttp
import asyncio

async def get_html(session, url):
async with session.get(url, ssl=False) as res:
return await res.text()

async def main():
async with aiohttp.ClientSession() as session:
html = await get_html(session, 'http://packtpub.com')
print(html)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Let's consider the main() coroutine first. We are initiating an instance from the aiohttp.ClientSession class within a context manager; note that we are also placing the async keyword in front of this declaration, since the whole context block itself will also be treated as a coroutine. Inside of this block, we are calling and waiting for the get_html() coroutine to process and return.

Turning our attention to the get_html() coroutine, we can see that it takes in a session object and a URL for the website that we want to extract the HTML source code from. Inside of this function, we make another context manager asynchronous, which is used to make a GET request and store the response from the server to the res variable. Finally, we return the HTML source code stored in the response; since the response is an object returned from the aiohttp.ClientSession class, its methods are asynchronous functions, and therefore we need to specify the await keyword when we call the text() function.

As you run the program, the entire HTML source code of Packt's website will be printed out. For example, the following is a portion of my output:

HTML source code from aiohttp
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.7.240