Redirects

Sometimes servers move their content around. They also make some content obsolete and put up new stuff in a different location. Sometimes they'd like us to use the more secure HTTPS protocol instead of HTTP. In all these cases, they may get traffic that asks for the old URLs, and in all these cases they'd probably prefer to be able to automatically send visitors to the new ones.

The 300 range of HTTP status codes is designed for this purpose. These codes indicate to the client that further action is required on their part to complete the request. The most commonly encountered action is to retry the request at a different URL. This is called a redirect.

We'll learn how this works when using urllib. Let's make a request:

>>> req = Request('http://www.gmail.com')
>>> response = urlopen(req)

Simple enough, but now, look at the URL of the response:

>>> response.url
'https://accounts.google.com/ServiceLogin?service=mail&passive=true&r m=false...'

This is not the URL that we requested! If we open this new URL in a browser, then we'll see that it's actually the Google login page (you may need to clear your browser cookies to see this if you already have a cached Google login session). Google redirected us from http://www.gmail.com to its login page, and urllib automatically followed the redirect. Moreover, we may have been redirected more than once. Look at the redirect_dict attribute of our request object:

>>> req.redirect_dict
{'https://accounts.google.com/ServiceLogin?service=...': 1, 'https://mail.google.com/mail/': 1}

The urllib package adds every URL that we were redirected through to this dict. We can see that we have actually been redirected twice, first to https://mail.google.com, and second to the login page.

When we send our first request, the server sends a response with a redirect status code, one of 301, 302, 303, or 307. All of these indicate a redirect. This response includes a Location header, which contains the new URL. The urllib package will submit a new request to that URL, and in the aforementioned case, it will receive yet another redirect, which will lead it to the Google login page.

Since urllib follows redirects for us, they generally don't affect us, but it's worth knowing that a response urllib returns may be for a URL different from what we had requested. Also, if we hit too many redirects for a single request (more than 10 for urllib), then urllib will give up and raise an urllib.error.HTTPError exception.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.96.214