Chapter 2. HTTP and Working with the Web

The Hypertext Transfer Protocol (HTTP) is probably the most widely-used application layer protocol. It was originally developed to allow academics to share HTML documents. Nowadays, it is used as the core protocol of innumerable applications across the Internet, and it is the principle protocol of the World Wide Web.

In this chapter, we will cover the following topics:

  • The HTTP protocol structure
  • Using Python for talking to services through HTTP
  • Downloading files
  • HTTP capabilities, such as compression and cookies
  • Handling errors
  • URLs
  • The Python standard library urllib package
  • Kenneth Reitz's third-party Requests package

The urllib package is the recommended Python standard library package for HTTP tasks. The standard library also has a low-level module called http. Although this offers access to almost all aspects of the protocol, it has not been designed for everyday use. The urllib package has a simpler interface, and it deals with everything that we are going to cover in this chapter.

The third-party Requests package is a very popular alternative to urllib. It has an elegant interface and a powerful featureset, and it is a great tool for streamlining HTTP workflows. We'll be discussing how it can be used in place of urllib at the end of the chapter.

Request and response

HTTP is an application layer protocol, and it is almost always used on top of TCP. The HTTP protocol has been deliberately defined to use a human-readable message format, but it can still be used for transporting arbitrary bytes data.

An HTTP exchange consists of two elements. A request made by the client, which asks the server for a particular resource specified by a URL, and a response, sent by the server, which supplies the resource that the client has asked for. If the server can't provide the resource that the client has requested, then the response will contain information about the failure.

This order of events is fixed in HTTP. All interactions are initiated by the client. The server never sends anything to the client without the client explicitly asking for it.

This chapter will teach you how to use Python as an HTTP client. We will learn how to make requests to servers and then interpret their responses. We will look at writing server-side applications in Chapter 9, Applications for the Web.

By far, the most widely used version of HTTP is 1.1, defined in RFCs 7230 to 7235. HTTP 2 is the latest version, which was officially ratified just as this book was going to press. Most of the semantics and syntax remain the same between versions 1.1 and 2, the main changes are in how the TCP connections are utilised. As of now, HTTP 2 isn't widely supported, so we will focus on version 1.1 in this book. If you do want to know more, HTTP 2 is documented in RFCs 7540 and 7541.

HTTP version 1.0, documented in RFC 1945, is still used by some older softwares. Version 1.1 is backwards-compatible with 1.0 though, and the urllib package and Requests both support HTTP 1.1, so when we're writing a client with Python we don't need to worry about whether we're connecting to an HTTP 1.0 server. It's just that some more advanced features are not available. Almost all services nowadays use version 1.1, so we won't go into the differences here. The stack overflow question is, a good starting point, if you need further information: http://stackoverflow.com/questions/246859/http-1-0-vs-1-1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.66.185