Customizing requests

To make use of the functionality that headers provide, we add headers to a request before sending it. To do this, we can't just use urlopen(). We need to follow these steps:

  • Create a Request object
  • Add headers to the request object
  • Use urlopen() to send the request object

We're going to learn how to customize a request for retrieving a Swedish version of the Debian home page. We will use the Accept-Language header, which tells the server our preferred language for the resource it returns. Note that not all servers hold versions of resources in multiple languages, so not all servers will respond to Accept-LanguageLinux home page.

First, we create a Request object:

>>> from urllib.request import Request
>>> req = Request('http://www.debian.org')

Next we add the header:

>>> req.add_header('Accept-Language', 'sv')

The add_header() method takes the name of the header and the contents of the header as arguments. The Accept-Language header takes two-letter ISO 639-1 language codes. The code for Swedish is sv.

Lastly, we submit the customized request with urlopen():

>>> response = urlopen(req)

We can check if the response is in Swedish by printing out the first few lines:

>>> response.readlines()[:5]
[b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
',
  b'<html lang="sv">
',
  b'<head>
',
  b'  <meta http-equiv="Content-Type" content="text/html; charset=utf-  8">
',
  b'  <title>Debian -- Det universella operativsystemet </title>
']

Jetta bra! The Accept-Language header has informed the server about our preferred language for the response's content.

To view the headers present in a request, do the following:

>>> req = Request('http://www.debian.org')
>>> req.add_header('Accept-Language', 'sv')
>>> req.header_items()
[('Accept-language', 'sv')]

The urlopen() method adds some of its own headers when we run it on a request:

>>> response = urlopen(req)
>>> req.header_items()
[('Accept-language', 'sv'), ('User-agent': 'Python-urllib/3.4'), ('Host': 'www.debian.org')]

A shortcut for adding headers is to add them at the same time that we create the request object, as shown here:

>>> headers = {'Accept-Language': 'sv'}
>>> req = Request('http://www.debian.org', headers=headers)
>>> req.header_items()
[('Accept-language', 'sv')]

We supply the headers as a dict to the Request object constructor as the headers keyword argument. In this way, we can add multiple headers in one go, by adding more entries to the dict.

Let's take a look at some more things that we can do with headers.

Content compression

The Accept-Encoding request header and the Content-Encoding response header can work together to allow us to temporarily encode the body of a response for transmission over the network. This is typically used for compressing the response and reducing the amount of data that needs to be transferred.

This process follows these steps:

  • The client sends a request with acceptable encodings listed in an Accept-Encoding header
  • The server picks an encoding method that it supports
  • The server encodes the body using this encoding method
  • The server sends the response, specifying the encoding it has used in a Content-Encoding header
  • The client decodes the response body using the specified encoding method

Let's discuss how to request a document and get the server to use gzip compression for the response body. First, let's construct the request:

>>> req = Request('http://www.debian.org')

Next, add the Accept-Encoding header:

>>> req.add_header('Accept-Encoding', 'gzip')

And then, submit it with the help of urlopen():

>>> response = urlopen(req)

We can check if the server is using gzip compression by looking at the response's Content-Encoding header:

>>> response.getheader('Content-Encoding')
'gzip'

We can then decompress the body data by using the gzip module:

>>> import gzip
>>> content = gzip.decompress(response.read())
>>> content.splitlines()[:5]
[b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">',
  b'<html lang="en">',
  b'<head>',
  b'  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">',
  b'  <title>Debian -- The Universal Operating System </title>']

Encodings are registered with IANA. The current list contains: gzip, compress, deflate, and identity. The first three refer to specific compression methods. The last one allows the client to specify that it doesn't want any encoding applied to the content.

Let's see what happens if we ask for no compression by using the identity encoding:

>>> req = Request('http://www.debian.org')
>>> req.add_header('Accept-Encoding', 'identity')
>>> response = urlopen(req)
>>> print(response.getheader('Content-Encoding'))
None

When a server uses the identity encoding type, no Content-Encoding header is included in the response.

Multiple values

To tell the server that we can accept more than one encoding, add more values to the Accept-Encoding header and separate them by commas. Let's try it. We create our Request object:

>>> req = Request('http://www.debian.org')

Then, we add our header, and this time we include more encodings:

>>> encodings = 'deflate, gzip, identity'
>>> req.add_header('Accept-Encoding', encodings)

Now, we submit the request and then check the response encoding:

>>> response = urlopen(req)
>>> response.getheader('Content-Encoding')
'gzip'

If needed, relative weightings can be given to specific encodings by adding a q value:

>>> encodings = 'gzip, deflate;q=0.8, identity;q=0.0'

The q value follows the encoding name, and it is separated by a semicolon. The maximum q value is 1.0, and this is also the default if no q value is given. So, the preceding line should be interpreted as my first preference for encoding is gzip, my second preference is deflate, and my third preference is identity, if nothing else is available.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.93.0