The Requests library

So that's it for the urllib package. As you can see, access to the standard library is more than adequate for most HTTP tasks. We haven't touched upon all of its capabilities. There are numerous handler classes which we haven't discussed, plus the opener interface is extensible.

However, the API isn't the most elegant, and there have been several attempts made to improve it. One of these is the very popular third-party library called Requests. It's available as the requests package on PyPi. It can either be installed through Pip or be downloaded from http://docs.python-requests.org, which hosts the documentation.

The Requests library automates and simplifies many of the tasks that we've been looking at. The quickest way of illustrating this is by trying some examples.

The commands for retrieving a URL with Requests are similar to retrieving a URL with the urllib package, as shown here:

>>> import requests
>>> response = requests.get('http://www.debian.org')

And we can look at properties of the response object. Try:

>>> response.status_code
200
>>> response.reason
'OK'
>>> response.url
'http://www.debian.org/'
>>> response.headers['content-type']
'text/html'

Note that the header name in the preceding command is in lowercase. The keys in the headers attribute of Requests response objects are case insensitive.

There are some convenience attributes that have been added to the response object:

>>> response.ok
True

The ok attribute indicates whether the request was successful. That is, the request contained a status code in the 200 range. Also:

>>> response.is_redirect
False

The is_redirect attribute indicates whether the request was redirected. We can also access the request properties through the response object:

>>> response.request.headers
{'User-Agent': 'python-requests/2.3.0 CPython/3.4.1 Linux/3.2.0-4- amd64', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*'}

Notice that Requests is automatically handling compression for us. It's including gzip and deflate in an Accept-Encoding header. If we look at the Content-Encoding response, then we will see that the response was in fact gzip compressed, and Requests transparently decompressed it for us:

>>> response.headers['content-encoding']
'gzip'

We can look at the response content in many more ways. To get the same bytes object as we got from an HTTPResponse object, perform the following:

>>> response.content
b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">...

But Requests also performs automatic decoding for us. To get the decoded content, do this:

>>> response.text
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>

...

Notice that this is now str rather than bytes. The Requests library uses values in the headers for choosing a character set and decoding the content to Unicode for us. If it can't get a character set from the headers, then it uses the chardet library (http://pypi.python.org/pypi/chardet) to make an estimate from the content itself. We can see what encoding Requests has chosen here:

>>> response.encoding
'ISO-8859-1'

We can even ask it to change the encoding that it has used:

>>> response.encoding = 'utf-8'

After changing the encoding, subsequent references to the text attribute for this response will return the content decoded by using the new encoding setting.

The Requests library automatically handles cookies. Give the following a try:

>>> response = requests.get('http://www.github.com')
>>> print(response.cookies)
<<class 'requests.cookies.RequestsCookieJar'>
[<Cookie logged_in=no for .github.com/>,
 <Cookie _gh_sess=eyJzZxNz... for ..github.com/>]>

The Requests library also has a Session class, which allows the reuse of cookies, and this is similar to using the http module's CookieJar and the urllib module's HTTPCookieHandler objects. Do the following to reuse the cookies in subsequent requests:

>>> s = requests.Session()
>>> s.get('http://www.google.com')
>>> response = s.get('http://google.com/preferences')

The Session object has the same interface as the requests module, so we use its get() method in the same way as we use the requests.get()method. Now, any cookies encountered are stored in the Session object, and they will be sent with corresponding requests when we use the get() method in the future.

Redirects are also automatically followed, in the same way as when using urllib, and any redirected requests are captured in the history attribute.

The different HTTP methods are easily accessible, they have their own functions:

>>> response = requests.head('http://www.google.com')
>>> response.status_code
200
>>> response.text
''

Custom headers are added to to requests in a similar way as they are when using urllib:

>>> headers = {'User-Agent': 'Mozilla/5.0 Firefox 24'}
>>> response = requests.get('http://www.debian.org', headers=headers)

Making requests with query strings is a straightforward process:

>>> params = {':action': 'search', 'term': 'Are you quite sure this is a cheese shop?'}
>>> response = requests.get('http://pypi.python.org/pypi', params=params)
>>> response.url
'https://pypi.python.org/pypi?%3Aaction=search&term=Are+you+quite+sur e+this+is+a+cheese+shop%3F'

The Requests library takes care of all the encoding and formatting for us.

Posting is similarly simplified, although we use the data keyword argument here:

>>> data = {'P', 'Python'}
>>> response = requests.post('http://search.debian.org/cgi- bin/omega', data=data)

Handling errors with Requests

Errors in Requests are handled slightly differently from how they are handled with urllib. Let's work through some error conditions and see how it works. Generate a 404 error by doing the following:

>>> response = requests.get('http://www.google.com/notawebpage')
>>> response.status_code
404

In this situation, urllib would have raised an exception, but notice that Requests doesn't. The Requests library can check the status code and raise a corresponding exception, but we have to ask it to do so:

>>> response.raise_for_status()
...
requests.exceptions.HTTPError: 404 Client Error

Now, try it on a successful request:

>>> r = requests.get('http://www.google.com')
>>> r.status_code
200
>>> r.raise_for_status()
None

It doesn't do anything, which in most situations would let our program exit a try/except block and then continue as we would want it to.

What happens if we get an error that is lower in the protocol stack? Try the following:

>>> r = requests.get('http://192.0.2.1')
...
requests.exceptions.ConnectionError: HTTPConnectionPool(...

We have made a request for a host that doesn't exist and once it has timed out, we get a ConnectionError exception.

The Requests library simply reduces the workload that is involved in using HTTP in Python as compared to urllib. Unless you have a requirement for using urllib, I would always recommend using Requests for your projects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.218.226