Cookies

A cookie is a small piece of data that the server sends in a Set-Cookie header as a part of the response. The client stores cookies locally and includes them in any future requests that are sent to the server.

Servers use cookies in various ways. They can add a unique ID to them, which enables them to track a client as it accesses different areas of a site. They can store a login token, which will automatically log the client in, even if the client leaves the site and then accesses it later. They can also be used for storing the client's user preferences or snippets of personalizing information, and so on.

Cookies are necessary because the server has no other way of tracking a client between requests. HTTP is called a stateless protocol. It doesn't contain an explicit mechanism for a server to know for sure that two requests have come from the same client. Without cookies to allow the server to add some uniquely identifying information to the requests, things such as shopping carts (which were the original problem that cookies were developed to solve) would become impossible to build, because the server would not be able to determine which basket goes with which request.

We may need to handle cookies in Python because without them, some sites don't behave as expected. When using Python, we may also want to access the parts of a site which require a login, and the login sessions are usually maintained through cookies.

Cookie handling

We're going to discuss how to handle cookies with urllib. First, we need to create a place for storing the cookies that the server will send us:

>>> from http.cookiejar import CookieJar
>>> cookie_jar = CookieJar()

Next, we build something called an urllib opener . This will automatically extract the cookies from the responses that we receive and then store them in our cookie jar:

>>> from urllib.request import build_opener, HTTPCookieProcessor
>>> opener = build_opener(HTTPCookieProcessor(cookie_jar))

Then, we can use our opener to make an HTTP request:

>>> opener.open('http://www.github.com')

Lastly, we can check that the server has sent us some cookies:

>>> len(cookie_jar)
2

Whenever we use opener to make further requests, the HTTPCookieProcessor functionality will check our cookie_jar to see if it contains any cookies for that site and then it will automatically add them to our requests. It will also add any further cookies that are received to the cookie jar.

The http.cookiejar module also contains a FileCookieJar class, that works in the same way as CookieJar, but it provides an additional function for easily saving the cookies to a file. This allows persistence of cookies across Python sessions.

Know your cookies

It's worth looking at the properties of cookies in more detail. Let's examine the cookies that GitHub sent us in the preceding section.

To do this, we need to pull the cookies out of the cookie jar. The CookieJar module doesn't let us access them directly, but it supports the iterator protocol. So, a quick way of getting them is to create a list from it:

>>> cookies = list(cookie_jar)
>>> cookies
[Cookie(version=0, name='logged_in', value='no', ...),
 Cookie(version=0, name='_gh_sess', value='eyJzZxNzaW9uX...', ...)
]

You can see that we have two Cookie objects. Now, let's pull out some information from the first one:

>>> cookies[0].name
'logged_in'
>>> cookies[0].value
'no'

The cookie's name allows the server to quickly reference it. This cookie is clearly a part of the mechanism that GitHub uses for finding out whether we've logged in yet. Next, let's do the following:

>>> cookies[0].domain
'.github.com'
>>> cookies[0].path
'/'

The domain and the path are the areas for which this cookie is valid, so our urllib opener will include this cookie in any request that it sends to www.github.com and its sub-domains, where the path is anywhere below the root.

Now, let's look at the cookie's lifetime:

>>> cookies[0].expires
2060882017

This is a Unix timestamp; we can convert it to datetime:

>>> import datetime
>>> datetime.datetime.fromtimestamp(cookies[0].expires)
datetime.datetime(2035, 4, 22, 20, 13, 37)

So, our cookie will expire on 22nd of April, 2035. An expiry date is the amount of time that the server would like the client to hold on to the cookie for. Once the expiry date has passed, the client can throw the cookie away and the server will send a new one with the next request. Of course, there's nothing to stop a client from immediately throwing the cookie away, though on some sites this may break functionality that depends on the cookie.

Let's discuss two common cookie flags:

>>> print(cookies[0].get_nonstandard_attr('HttpOnly'))
None

Cookies that are stored on a client can be accessed in a number of ways:

  • By the client as part of an HTTP request and response sequence
  • By scripts running in the client, such as JavaScript
  • By other processes running in the client, such as Flash

The HttpOnly flag indicates that the client should only allow access to a cookie when the access is part of an HTTP request or response. The other methods should be denied access. This will protect the client against Cross-site scripting attacks (see Chapter 9, Applications for the Web, for more information on these). This is an important security feature, and when the server sets it, our application should behaves accordingly.

There is also a secure flag:

>>> cookies[0].secure
True

If the value is true, the Secure flag indicates that the cookie should only ever be sent over a secure connection, such as HTTPS. Again, we should honor this if the flag has been set such that when our application send requests containing this cookie, it only sends them to HTTPS URLs.

You may have spotted an inconsistency here. Our URL has requested a response over HTTP, yet the server has sent us a cookie, which it's requesting to be sent only over secure connections. Surely the site designers didn't overlook a security loophole like that? Rest assured; they didn't. The response was actually sent over HTTPS. But, how did that happen? Well, the answer lies with redirects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.228.88