You would like to write a simple HTTP client to fetch some data from any web server using the native HTTP protocol. This can be the very first steps towards creating your own HTTP browser.
Let us access www.python.org with our Pythonic minimal browser that uses Python's httplib
.
Listing 4.1 explains the following code for a simple HTTP client:
#!/usr/bin/env python # Python Network Programming Cookbook -- Chapter - 4 # This program is optimized for Python 2.7. # It may run on any other version with/without modifications. import argparse import httplib REMOTE_SERVER_HOST = 'www.python.org' REMOTE_SERVER_PATH = '/' class HTTPClient: def __init__(self, host): self.host = host def fetch(self, path): http = httplib.HTTP(self.host) # Prepare header http.putrequest("GET", path) http.putheader("User-Agent", __file__) http.putheader("Host", self.host) http.putheader("Accept", "*/*") http.endheaders() try: errcode, errmsg, headers = http.getreply() except Exception, e: print "Client failed error code: %s message:%s headers:%s" %(errcode, errmsg, headers) else: print "Got homepage from %s" %self.host file = http.getfile() return file.read() if __name__ == "__main__": parser = argparse.ArgumentParser(description='HTTP Client Example') parser.add_argument('--host', action="store", dest="host", default=REMOTE_SERVER_HOST) parser.add_argument('--path', action="store", dest="path", default=REMOTE_SERVER_PATH) given_args = parser.parse_args() host, path = given_args.host, given_args.path client = HTTPClient(host) print client.fetch(path)
This recipe will by default fetch a page from www.python.org. You can run this recipe with or without the host and path arguments. If this script is run, it will show the following output:
$ python 4_1_download_data.py --host=www.python.org Got homepage from www.python.org <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.og/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Python Programming Language – Official Website</title> ....
If you run this recipe with an invalid path, it will show the following server response:
$ python 4_1_download_data.py --host='www.python.org' --path='/not- exist' Got homepage from www.python.org <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Page Not Found</title> <meta name="keywords" content="Page Not Found" /> <meta name="description" content="Page Not Found" />
This recipe defines an HTTPClient
class that fetches data from the remote host. It is built using Python's native httplib
library. In the fetch()
method, it uses the HTTP()
function and other auxiliary functions to create a dummy HTTP client, such as putrequest()
or putheader()
. It first puts the GET/path
string that is followed by setting up a user agent, which is the name of the current script (__file__
).
The main request getreply()
method is put inside a try-except block. The response is retrieved from the getfile()
method and the stream's content is read.
3.21.100.62